August, 1947 


pica Gerpraion Water V. BricHaM, A.G.0., War Department ; 

Ro ar te Unio ; Antmur I. Gargs, T. C. Columbia University; 
man G. JENKINS, Univer sity f . land; Invinc Lorcse, 7. C. Columbia University; 
Quinn McNemar, Stanford Univ oP sil ss WILLarRp C. Ouson, University of Michigan; 
jam 2 P. Porter, Swarthmore, Pens yleania ; : Epwarp K. STRONG, Jr., Stanford University; 
ORR Nig ei of Penns; ia; Josnrn Zusin, N. Y. Psychiatric Institute. 





Table of Contents 


- Studies in Job Evaluation. 6. The Reliability of Two Point Rating Systems: 
C. H. Lawsue anp R. F. Wrson 355 


: the Gensco of a Test of Ability to Repeat Spoken Misinaes 
es J. C. Swmecor anp T. D. HAnzey 397 


- Fiouat Acuity | in Relation to Iumination in the Ortho-Rater: 
eee 4 R. Fernzerc anp S. E. Wier 406 


Reliab of Anecdotal Material in the First Annual Science Talent Search: 
wis H. A. Evcrrron, S. H: Brirr, anv W. B. Lemmon 413 


What Does Americanism Mean to the American People?: H. C. Linx 


a ‘The Effect of “Look” and “Read” Directions upon the Attention Value of Iliustra- 
tions and Texts in Magasine Advertisements: E. J. Asner anv D. Kaun .... 431 


ger 7 Students in Liberal Arts, tig and Teacher Training Curricula and the 
_« Minnesota Multiphasic Personality Inventory: O. M. Lovcx 


to Winfield’s Study 4; the Multiple Choice Rorschach: H. L. Sisx 


A Method and Tables for Ob j Standard Errors of Differences Between. Pro- 
“ese when N, is Equal to Na: W. H. Licutre 





‘Published Bi-monthly by The American Psychological Association, Inc. 
Prince and Lemon Sts., Lancaster, Pa., and 
1515 Massachusetts Ave., NW, Washington 5, D. C. 


| Batered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the Act of March 3, 1879 
. Copyright, 1947, by The American Psychological Association, Inc. 








Journal of Applied Psychology 


Vou. 31, No. 4 August, 1947 








Studies in Job Evaluation. 6. The Reliability of 
Two Point Rating Systems 


C. H. Lawshe, Jr. and R. F. Wilson 
Division of Applied Psychology, Purdue University 


The job evaluation technique of systematizing wage rates is of com- 
paratively recent origin. Like other new approaches its success has in- 
spired extensive experimentation. Companies are constantly devising 
new plans and trying out new methods in an attempt to facilitate their 
job evaluation procedures. In most systems the description of a job is 
broken down into several elements such as “skill required on the job,” 
“general schooling or intelligence required on the job,” “working condi- 
tions under which the job is performed’’—and so on. In the point rating 
procedures of job evaluation, which are the most widely used, these sepa- 
rate elements on which each job is rated are set up each as a different 
scale. A job being ratcd is assigned a number of points in accordance 
with its rated position on each scale, and the total of those points equals 
that job’s rating which may later be translated into wage rates. 

Earlier Findings. Studies conducted by Lawshe and others (4, 5, 6, 
7, 8) have analyzed data from several job evaluation installations which 
used different plans, and through successive factor analyses the hypothesis 
has been advanced that a limited number of factors seem to operate in 
determining job ratings through the job evaluation technique. The 
final job rank seems to be determined by judgments on a limited number 
of factors, regardless of the particular type of procedure or the number 
of point scales through which the raters arrive at the final ratings of the 
jobs. As the factor analysis of cata from each successive installation was 
resolved, it became more apparent that the description of these few judg- 
ments was pretty much the same for all of the systems which had been 
studied. 

In view of the small number of factors which seemed to determine 
the hierarchy of jobs as established by job evaluation, these studies ex- 
plored the possibility that abbreviations of each system would yield job 
ratings closely comparable to the job ratings produced by the lengthier 

355 











356 C. H. Lawshe, Jr. and R. F. Wilson 


original procedure. In each installation it was found that the rated 
values on three of the items could be used to produce ratings which cor- 
related from .93 to .99 with the original ratings. These three items were 
selected in each case by use of the Wherry-Doolittle shrinkage selection 
technique (10). In most cases, these three items selected by the Wherry- 
Doolittle process were similarly or identically defined. 

The high correlations between the abbreviated ratings and the length- 
ier system ratings immediately brought up the consideration of reliability. 
Was the agreement between the original scale and the abbreviated one as 
great as could be expected in terms of the reliability of the scale? The 
additional question was raised regarding the relative reliability of the 
long and the short scale. 

Purpose of This Study... The study reported here has attempted to 
answer several questions. First, what is the reliability of the total point 
ratings under a job evaluation procedure which uses ten or more items? 
Second, what is the reliability of total point ratings under a job evaluation 
plan which uses only four items? Third, what is the reliability of each of 
the items in thase two systems? The answers to these questions will fur- 
nish a basis for an analytical comparison of the ways in which lengthier 
plans affect the job rating process as compared to shorter plans. 


Procedure 


The NEMA System. The NEMA system (3) of job evaluation was 
chosen for this study because of its wide acceptance and because its system 
of point ratings on eleven scales conformed closely to the type of procedure 
for which reliability figures were desired. This plan, which was adopted 
by the National Electrical Manufacturers Association, provides for the 


1 The authors wish to acknowledge the capable and enthusiastic assistance of twenty 
men from the industrial field who made this study possible. While the results reported 
here are based upon their ratings, the conclusions drawn are solely those of the authors 
and there‘is no intent to imply that the participants subscribe in whole or in part to the 
conclusions. Their names are reproduced with gratitude: 

Carl F. Bracken, John A. Patton-Management Engineers; H. L. Dawson, Carnegie- 
Illinois Steel Corp.; A. C. Eckerman, Wright Aeronautical Corp.; John S. Evans, 
Curtiss Wright Corporation, Propeller Division; C. H. Fowler, AC Spark Plug Division 
of General Motors; Paul H. Henning, The Warner & Swasey Co.; 8. Hinds, Radio 
Corporation of America; W. H. Kelly, Foote Brothers Gear and Machine Corp.; A. W. 
King, Rustless Iron and Steel Corp.; T. 8. Krawiec, Bendix Radio Division of Bendix 
Aviation Corp.; J. A. McCormic, Kirsch Company; E. J. Mullen, I. T. E. Circuit Breaker 
Company; L. J. Nartker, Delco Products Division of General Motors; Charles W. Ritt, 
Republic Aviation Corp.; C. M. Russell, Minneapolis-Honeywell Regulator Company; 
Clarence Schlesier, Caterpillar Military Engine Company; Merle H. Smith, Elgin 
National Watch Company; Karl H. Steinman, J. I. Case Company; A. F. Trumbore, 
Allis Chalmers Manufacturing Company; J. W. Wynn, International Harvestor 
Company. 











Studies in Job Evaluation. 6 357 


rating of jobs on eleven items. A job being rated is identified with one 
of five degrees on each of the eleven scale items. The total of points as- 
signed a job on each of the eleven is the total point rating for that job. 

The Simplified Job Evaluation System. At the same time that reliabil- 
ity was being investigated on the NEMA system, an experimental sim- 
plified four item job evaluation plan was developed ? with the four items 
defined in the light of the factor analyses results and the Wherry-Doo- 
little shrinkage selection results obtained in the previous studies. A job 
rating is obtained in the same way as under the NEMA system—by as- 
signing a job to a degree catagory on each of the four items and totalling 
the point ratings on the four items. 

The Job Descriptions. Forty jobs were selected, each of which is 
commonly performed in many industries and which is familiar to most 
industrial job raters. Job descriptions were adapted from the USES Na- 
tional Job Description Series (1, 2) so as to be specific in nature. The list 
of these jobs together with their USES code numbers and job titles is 
shown in Table 1. 

Rating the Jobs. Of the twenty analysts who agreed to rate jobs for 
this study, ten rated jobs using the NEMA system, and ten rated jobs 
using the Simplified System. Assignment of the cooperating analysts to 
the NEMA system or to the Simplified System was arrived at by chance, 
the first analyst agreeing to participate being assigned to the NEMA 
system, the second to the Simplified System, the third to NEMA and so 
so on in order of receipt of their agreement to participate in the study. 
Complete instructions in regard to the rating procedure were furnished 
to each cooperating analyst. 

To avoid overloading analysts, each was asked to rate only twenty of 
the forty jobs. The plan for assignment of job descriptions to the ana- 
lysts is shown in Table 1 and was designed to avoid systematic error or 
bias. Referring to Table 1, it is seen that each of the jobs was rated by 
five different analysts under each job evaluation procedure. No two ana- 
lysts rated the same twenty jobs under the NEMA plan, and no two ana- 
lysts rated the same twenty jobs under the Simplified plan. Judges A 
and B, however, together rated all jobs; C and D rated all jobs but in a 
different combination. Likewise each consecutive pair of judges con- 
stitute “one man” and pairs were so treated in computing correlations. 
The average intercorrelation of the ratings of five men on forty jobs was 
obtained as the reliability coefficient for the total point rating under each 
plan and for each of the items in the two plans. 


? This simplified plan was developed by the senior author and is as yet unpublished. 
The four items presently included are General Schooling, Learning Period, Surrounding 
Conditions, and Job Hazards. 








C. H. Lawshe, Jr. and R. F. Wilson 


6 
oe) 

















x x x x x x x x x x a1y ‘1OPPPM OZOSSF 02 
x x x x x x x x x x QOUBUdUIBY JOYVULIAIIOg §OOI'ES-+F 61 
x x x x x x x x x x I] 29910 TRF 399UG Ss: 08-F_ ~— es ST 
x x x x x x x x x x 10j810dQ Jepuliy-sovjing §=6gOe’sl-—F ss 
x x x x x x x x x x 10j810dQ 12Ue[g ILZ0'S8l-F 91 
x =: 4 x x x x oS x x ioyeiedg redeyg [90°8Z-4F SI 
x x . om x x x 2-3 x 10j;81edQ sUTyORy 

Buy Pus Suo0g [eyuozuOHH -ZFORL-F Ob 
x x x a x x x eo. 2 10j;810dQ 9438] OUlZUSCSTIO'RL-F~GT 
x x x x x x x x x x OAV WMC puepooy OIZel+ ZI 
x x x x x x x x x z (punory |TV) wiaIyeByy «—OLO'SZ-F_—CiT. 
x x x x x x x x x x QOUBUSUTBYY WSIUTYIVPY «= OIO'SZ-F_~—Cs«OOOT 

x x x x x x x x x x | 4ede0y spunoiry POOrF-e 6 

S 2 x x x zo. x x x qysio1y ‘10ye10dQ 1078A9]7q oe's6-% 8 

x es x x x a x x radaamg 0z'98-2 2 

x x , a x x x 2s x ] 20h1U8r OlVrnR-z 9 

x x x 3 x x x 5 2 uBWOMIBY ‘OL'Z8Z &e 

x x x x x x x x x x UBUIYOE MA 019-2 =F 

x x x x x x x x x x 41 [OL co’se-I ¢ 

x x x x x x x x x x UB] WOOTYI0}g I68E-I Zz 

x x x x x x x x x x ] Jeduessoyy PlISzZ-I ~«C@«C: 
See ee oe a ae ee aS Ss ee ee ee ee SoTL Gor 2P9_—« “ON 
sash or 

usd pegydurg Uvid VAAN 





sysAjeuy gor Zuryessdo0,g 








819}8}] Buryyssedo00D sy} 0} suoNndiuoseq gor jo JUOWIUSISsy ZuLMOYY, aq], 
T 19%L, 








J9YsB A VIqowWo Ny Z0°SS-6 
I 2210 1012-6 
JOUBIID) JO[LOG bI'S9-0 
einsselg MOT ‘uBsUulelLy = «OT0'OZ-L 
iqquiessy  ZE9°8l-9 
1oyeiedg MBg pueg  [19'8l-9 
wyng O20 LL-9 
III 2°33nq —0Z0°88-9 
10}819dQ JOpuLIy [OO], OI ts-¢ 
Suipiing ‘uvyy souvusjureyy = [19 E8-C 
WIUMTIA = —OOT'SL-¢ 

103819dQ) 
ouBl)-93plug-11j991q «= OLO"EZ-C 
soususjuIByy ‘JequUINIg O1Z'0e-¢ 
JdINSB_J-uosvspy «= OO 6Z-8 
sUBUdIUIBY INIT O10 LZ-C 
sousuazUIBYy IjuedIBD  OER'Sz-C 
uvunedey [811901 OZ L6-F 
19801, -389F, O10°'L8-F 
. II Hruasyovg_ O10'98-F 
F JopjaM VyAJOoy §=—OEO'SS-FC«dZ 


6 


NRRARSRARR RERRASRSRES 


s 
3 
<3) 
D> 

~) 
s 
§ 

~® 
& 
3 
RMR 





0 dd ON M soy, Gor epoy Ss “ON 
saso of 
uv[g pegydung 











sysAeuy gor Zunsiedoo; 








panurywoj—T 4%, 











360 C. H. Lawshe, Jr. and R. F. Wilson 


Results 


The Obtained “one against one” Reliabilities. The obtained reliabil- 
ities for each of the scales in the two systems and for the point totals for 
the two systems are shown in Table 2. It should be emphasized that, 
based on the data of this study, the figure shown in the first column is the 
most likely correlation between the ratings of one rater with the ratings of 
one other rater. For convenience this has been called the “one against 
one”’ reliability. 

Table 2 


Obtained Reliabilities for Total Point Ratings and for the Component Rating Scores 
in the NEMA and Simplified Job Evaluation System 











Item “1 against 1” “5 against 5” 
Simplified System 
General Schooling .79 .95 
Learning Period 86 97 
Working Conditions 61 89 
Job Hazards 51 84 
Total Points .89 .98 
NEMA System 
Education 77 .94 
Experience 82 .96 
Initiative and Ingenuity .78 .95 
Physical Demand AT 82 
Mental or Visual Demand 37 75 
R—Equipment 41 .78 
R—Material 40 77 
R—Safety of Others .54 85 
R—Work of Others 51 84 
Working Conditions 54 85 
Unavoidable Hazards 34 .72 
Total Points 77 94 





Two reasons are immediately apparent why the “one against one”’ 
reliability as obtained in this study should tend to be lower than the “‘one 
against one”’ reliability of ratings in the industrial situation. First, the 
analysts in this study would tend to visualize the forty jobs by considering 
similar jobs in their own plants or industries. This could cause incon- 
sistencies in rating which would not occur if raters were all working in the 
same plant situation and visualizing the same jobs. Second, in the in- 
dustrial situation the analysts would, or should, receive uniform training 
and instruction in techniques of appraising and rating jobs through a 
series of sessions with a job rating instructor or supervisor. In contrast, 
the analysts in this study had no chance to thrash out verbally individual 


a ee ET 
. 








Studies in Job Evaluation. 6 361 


differences in approach to job rating and interpretation of the rating in- 
structions. This factor tending to produce a spuriously low obtained re- 
liability would be partially offset by the fact that the cooperating 
analysts in this study are probably of higher calibre, in terms of their com- 
prehension of rating problems, than many job raters since most of the 
cooperating analysts are themselves job evaluation supervisors and admin- 
istrators. 

The “‘five against five’ Reliabilities. Still another consideration makes 
the ‘fone against one’’ reliabilities shown in Table 2 inadequate as esti- 
mates of the true reliabilities which would apply to the industrial situa- 
tion. Rarely, if ever, does one rater determine the final ratings to be used 
in a job evaluation. This function is nearly always accomplished by 
“committee action” or by a panel of members, consequently the relia- 
bility of the final ratings under a job evaluation system should be estab- 
lished for the mean ratings of, say, five analysts. While it is true that 
one member of a panel sometimes dominates the decisions of that com- 
mittee, it is the judgment of the authors that the reliability of the mean 
rating of five raters more closely approximates the true industrial situ- 
ation than does the reliability of the ratings of a single rater. 

Accordingly, the obtained reliabilities were “stepped up” by use of 
the Spearman-Brown formula’ to estimate the increased reliabilities which 
would result from using the pooled ratings of a five member committee. 
The “stepped up” reliabilities are also shown in Table 2, and represent 
the estimated correlation of the pooled ratings of five raters with the 
pooled ratings of another five raters. This might be called the “five against 
five” reliability. 

However, the authors do not maintain that the reliabilities obtained 
in this study are absolute. The results of the study should be qualified 
and interpreted by each reader in accordance with his evaluation of the 
statistical assumptions involved. With special reference to the consider- 
ations which would tend to make the “one against one’’ reliabilities too 
low, the “five against five’’ reliabilities become more acceptable as a rea- 


’ The Spearman-Brown formula has repeatedly established its validity in predicting 
increased reliability which would result from pooled ratings. In a short empirical test 
made in this study, the increased reliabilities which would result from pooling the 
ratings of two raters were estimated by the Spearman-Brown formula to be .87 for the 
total point ratings under the NEMA system and .94 for the total point ratings under 
the Simplified System. The actual correlations were then obtained by using the mean 
of the first two ratings on each job against the mean of the second two ratings on each 
job. The actual “two against two” reliabilities discussed below were .91 for the NEMA 
system, and .95 for the Simplified System. Thus in both cases the Spearman-Brown 
formula established an estimate well within the limits of normal expectancy and on the 
conservative side. 





362 C. H. Lawshe, Jr. and R. F. Wilson 


sonable estimate of the true reliability which is likely to occur in the in- 
dustrial situation. 

Comparison of the Systems. In comparing the two systems, the first 
item of interest is the reliability of the total point ratings arrived at 
through the two systems. The obtained reliability coefficient for NEMA 
total points is .77 and for the Simplified total points is .89. Since the 
significance of difference formulas is not considered applicable to relia- 
ability coefficients‘ by some statisticians, it will suffice to observe that 
these obtained figures favor the conclusion that the reliability of ratings 
obtained through the use of the Simplified System is higher than the 
reliability of ratings obtained through the use of the NEMA system. 
The correlation between the average total point NEMA ratings on the 
forty jobs and the average total point ratings under the Simplified System 
was .90. 

Practical Significance. In a problem of this type it is desirable to ex- 
amine the operating significance of these reliability differences. What do 
these differences mean in terms of labor grade displacements? Because 
of the limited number of judges it was impossible to interpret the ‘five 
against five’ correlations in this fashion, but comparison of “‘two against 
two”’ was possible. Means of the first two NEMA judgments were cor- 
related with the means of the second two NEMA judgments for the forty 
jobs and an r of .91 was obtained. Similarly, when the means of the first 
two Simplified ratings were correlated with the means of the second two 
simplified rating an r of .95 was found. Labor grade divisions for the 
Simplified System were established so as to be comparable (in 8S. D. units) 
to those in the NEMA plan and the discrepancies for each system are 
shown in Table 3. Note that for the NEMA plan the means of two sets 
of ratings when compared to the means of two other sets of ratings placed 
14 of the 40 jobs or 35% in the same labor grade. In the Simplified Sys- 
tem 17 or 43% were placed in the same labor grade. Further examination 
of the table shows that if arbitrary limits of ‘two labor grades or less’’ are 
established, 90% of the jobs fall within the limits under the NEMA plan 
while 98% do so under the Simplified plan. It should be kept in mind 

‘ Using the standard formula for significance of difference between correlation coeffi- 
cients (measures correlated .90) a “‘t’’ value of 3 plus is obtained which supports the 
above difference in reliabilities at better than the 1% confidence level. Two extremely 
conservative measures were used to obtain this “t” value. The first is that in obtaining 
the standard deviation of the reliability coefficients, the formula for standard error of 
a simple Pearsonian r was used. Second, the standard errors of the reliability coeffi- 
cients were calculated by using N = 40. Of course, this N could be used if only two 
ratings had been used to establish the reliabilities. The fact that the intercorrelation 
of five ratings on each of forty jobs was used to establish the reliabilities makes the use 
of N = 40 result in a highly conservative figure in estimating the significance of the 
obtained differences. 





Studies in Job Evaluation. 6 363 


Table 3 
Discrepancies between Labor Grade Placements of 40 Jobs by Two Mean Ratings, 
Each Based upon Two Judgments 
(for Both NEMA and Simplified Systems) 














NEMA Simplified 
Deviations in 
Labor Grades f cf % f cf 


None 14 35 17 17 
One Grade 16 75 14 31 
Two Grades 6 90 8 39 
Three Grades 2 95 0 39 
Four Grades 1 98 1 40 
Five Grades 1 100 0 40 


Mean Deviation 1.08 .85 











that the picture for both systems would appear better if a “five against 
five” comparison were possible. However, the best assumption is that 
the amount of agreement between raters would still be relatively greater 
with the Simplified System. 

Comparison of Items. The most reliable single scale in the Simplified 
System (Learning Period) was the same as the most reliable scale in the 
NEMA system (Experience). Both scales connote apprenticeship period 


or the amount of time necessary for a worker to learn to do the job. The 
second most reliable scale in the Simplified System (General Schooling) 
measures about the same element as the next most reliable two scales in 
the NEMA system (Education and Initiative and Ingenuity). These 
scales connote a level of development which the worker must have to per- 
form the job.. The third most reliable item is the same for the Simplified 
System (Surrounding Conditions) as for the NEMA system (Working 
Conditions). 

The least reliable scale in the Simplified System (Hazards) was the 
same as the least reliable scale in the NEMA system (Unavoidable Haz- 
ards). Each scale in the Simplified System had a higher reliability than 
its counterpart in the NEMA system. It also seems significant that the 
obtained reliabilities of five of the eleven NEMA items are lower than the 
lowest Simplified value. 

Statistical Assumptions in the Comparisons of the Systems. The pri- 
mary statistical assumption involved in the foregoing comparison of the 
two systems is that the ten raters who used the Simplified System are from 
the same population of raters as the ten raters who used the NEMA sys- 
tem. Putting it another way, it is assumed that the rating ability of the 
two groups of raters is equal and comparable. Supporting this assump- 
tion is the fact that all of the cooperating raters were selected from a group 











364 C. H. Lawshe, Jr. and R. F. Wilson 


of men in industry who work professionally in job evaluation and related 
fields, and the fact that their assignment to one system or the other was 
arrived at purely by chance. 

The Skill Factor. Analysis of the relative reliability of the various 
scales can be carried a bit further than merely observing that the same 
scales tended to be reliable or unreliable in both systems. Three scales in 
the NEMA system and two scales in the Simplified System are designated 
as scales measuring “skill.” It is significant to note that these five scales 
have a relatively high reliability, and that their reliability coefficients 
form a cluster of values from .77 to .86. All of the other scales con- 
sidered as a group have relatively low reliabilities, and their reliability 
coefficients form a significantly lower cluster of values from .34 to .61. 

In light of the other studies (4, 5, 6, 7, 8) on job evaluation mentioned 
before, it is particularly significant to note the higher reliability of the 
skill-measuring scales. The factor analyses of the several installations 
of different systems showed that a general ‘“‘skill demands’ factor accounts 
for 77% to 99% of the variation in total point job ratings in the install- 
ations studied. Those studies have shown that “skill” is the principal 
determinant of job rating, and the present study might be interpreted to 
show that “skill demand”’ is so strongly traditional in the determination 
of job level that raters are much more practiced and reliable in their ana- 
lysis and rating of skill requirements than in their rating of any other 
factor. 


Summary and Conclusions 


Ten cooperating raters from the industrial field submitted job evalu- 
ation rating data, using the eleven scale rating system of the NEMA job 
evaluation, and another ten cooperating raters from the industrial field 
submitted rating data using the rating system specified in an experimental 
four scale Simplified System of job evaluation. Both groups of raters 
used the same forty job descriptions as a basis for their ratings. 

Reliabilities for the total point ratings under both systems and for the 
individual items within the two systems were obtained by computing the 
average intercorrelation of the ratings. These average intercorrelations 
of one rater with another rater were then stepped up by the Spearman- 
Brown formula to obtain an estimate of the effective reliability of the 
ratings as performed by a panel or committee in the usual industrial sit- 
uation. 

The following conclusions are supported: 


1. The Simplified System demonstrated a higher reliability (.89—one 
rater, .98—five raters) than the NEMA system (.77—one rater, .94—five 
raters). 


IS 














Studies in Job Evaluation. 6 365 


2. A strikingly similar pattern of reliabilities was shown by the in- 
dividual scales within the two systems with the “‘skill’’ measuring items 
showing relatively high reliability and all other items showing relatively 
low reliability. 

3. The four individual items in the Simplified System showed con- 
sistently higher reliability than their counterpart scales in the NEMA sys- 
tem and five of the NEMA items yielded lower reliabilities than the least 
reliable Simplified item. 

4. The Simplified System total point reliability was higher than the 
reliability of any one of its component items while the NEMA total point 
reliability was lower than the reliability of two of its component items. 

5. The low reliabilities of the items, other than the skill measuring 
scales, reflect the need for more objective rating criteria for these elements, 
care in the construction of these items and the degree definitions, and in- 
dicate the probable advantage of training raters in the use of these scales. 

6. The results of this study demonstrate that a system employing a 
few items is not necessarily less reliable than a widely accepted system 
employing a greater number of items. 


References 


. Job descriptions for industrial service and maintenance jobs, National Job Descrip- 
tions Series, U. 8. Dept. of Labor, U. 8. E. 8., 1 vol., June, 1939. 

. Job descriptions for job machine shops, National Job Descriptions Series, U. 8. Dept. 
of Labor, U. 8. E. 8., 1 vol., April, 1938. 

. Job rating: definitions of the factors used in rating jobs—hourly rated occupations. 
Chicago: Industrial Relations Department, National Electrical Manufacturers 
Association, 1938. Pp. 22. 

. Lawshe, C. H., Jr. Studies in job evaluation. 2. The adequacy of abbreviated 
point ratings for hourly-paid jobs in three industrial plants. J. appl. Psychol., 
1945, 29, 177-184. 

. Lawshe, C. H., Jr. Studies‘in job evaluation. 3. An analysis of point ratings for 
salary paid jobs in an industrial plant. J. appl. Psychol., 1946, 30, 117-128. 

. Lawshe, C. H., Jr., and Alessi, 8. L. Studies in job evaluation. 4. Analysis of 
another point rating scale for hourly-paid jobs and the adequacy of an abbreviated 
scale. J. appl. Psychol., 1946, 30, 310-319. 

. Lawshe, C. H., Jr., and Satter,G. A. Studies in job evaluation. 1. Factor analy- 
sis of point ratings for hourly paid jobs in three industrial plants. J. appl. 
Psychol., 1944, 28, 189-198. 

. Lawshe, C. H., Jr., and Wilson, R. F. Studies in job evaluation. 5. An analysis 
of the factor comparison system as it functions in a paper mill. J. appl. Psychol., 
1946, 35, 426-434. 

. Peters, C. C., and van Voorhis, W. R. Statistical procedures and their mathematical 
bases. New York: McGraw-Hill Book Co., 1940. 

. Stead, W. H., Shartle, C. L., and Associates. Occupational counseling techniques, 
American Book Company (1940), pp. 245-252. 





Labor Turnover and Its Correlates * 


Willard A. Kerr 
Tulane University 


The industrial plant is in itself an adult community with many psy- 
chological and economic forces interacting to influence the behavior of 
its personnel. A scientific approach to understanding any variable in 
industrial psychology and personnel relations demands the quantitative 
examination of that variable within its context of influencing factors. 
Fortunately, the psychologist is able now to bring a larger and larger 
number of industrial variables into various types of analyses for the 
mutual guidance of management and labor. 

All coefficients of correlation reported in this paper are tetrachoric ' 
and their significance, the five per cent level being accepted, was deter- 
mined through use of T. L. Kelley’s tetrachoric reliability formula and 
the Guilford-Lyon’s tables.” 

The first and limited application of the present type of approach was 
conducted in an Indianapolis, Indiana, electronics factory in 1942 in an 
attempt to determine the relationship of twenty-four specific variables 
to labor turnover rates of seven major manufacturing departments which 
together employed approximately 3000 workers. 

Variables which were quantified in each of the twenty-three depart- 
ments and correlated with labor turnover include per cent of hourly-paid 
(ie. paid according to number of hours worked) employees belonging to 
the company athletic association, per cent of all employees belonging to 
the athletic asociation, per cent of employees who are hourly-paid, per 
cent of all who are male, average hours worked by hourly-paid males, 
females, and both combined, average hourly nonovertime earnings of 
hourly-paid males, females, and both combined, average sex wage differ- 
ential, average decibel noise level, sex ratio imbalance, absenteeism rate 
of hourly-paid employees, average supervisory quality, social prestige of 
average job, difficulty of average job, monotony of average job, promotion 

* The author acknowledges with appreciation the critical suggestions of Dr. H. M. 
Johnson. 


1 Chesire, L., Saffir, M., and Thurstone, L.L. Computing diagrams for the tetrachoric 
correlation coefficient. Chicago: University of Chicago Bookstore, 1933. 

* Guilford, J. P., and Lyons, T. C. On determining the reliability and significance 
of a tetrachoric coefficient of correlation. Psychometrika, 1942, 7, 243-249. 


366 











Labor Turnover and Its Correlates 367 


probability for average worker, average supervisory intelligence (group 
sampling test) and average ages of males, females, and all employees. 
Nineteen of these variables represent objective records and the remaining 
five represent the average ratings of eight individuals well acquainted 
with all divisions of the factory (including two psychologists, the work 
simplification director, the education and training director, the employ- 
ment manager, the personnel services manager, the recreational director, 
and the plant supervisor of office service and supply). The objective data 
in each department represent three-month averages. 


Table 1 
Tetrachoric Coefficients of Correlation between Departmental Labor Turnover and 
Each of Twenty-four Other Variables in an Indiana Factory 
Note: N = Seven major factory manufacturing divisions, each employing an average 
of more than 400 workers. Coefficients significant at 5 per cent level are set in bold face 
type. 








. Per cent of hourly-paid employees belonging to company athletic association 10 
. Per cent of all employees belonging to company athletic association .16 
Number of emplvyees in department 50 
Per cent of employees who are hourly paid 31 
. Per cent of all employees who are male — 49 
. Average hours worked by hourly-paid males —.18 
. Average hours worked by hourly-paid females — .08 
. Average hours worked by hourly-paid employees — .64 
. Average hourly non-overtime earnings of hourly paid males —.71 
. Average hourly non-overtime earnings of hourly paid females —.48 
. Average hourly non-overtime earnings of hourly paid employees —.31 
. Sex wage differential (9 minus 10) —.19 
. Average decibel noise level 12 
. Sex ratio imbalance (hourly paid) —.10 
. Absenteeism rate (hourly paid) — 31 
. Average supervisory quality —.40 
. Social prestige of average job — .56 
. Difficulty of average job —.59 
. Monotony of average job .73 
. Promotion probability for average worker —.76 
. Average supervisory intelligence (alpha) .62 
. Average age of males 16 
. Average age of females —.68 
. Average age of all employees —.26 


1 
2 
3. 
4. 
5 
6 
7 
8 
9 





Results of this correlational analysis are shown in Table 1. It is ap- 
parent from inspection of the coefficients in bold face in this table that 
only three of the relationships reach the level of statistical significance. 
However, these three appear to be not only statistically significant but 
also highly significant for their practical implications for industrial har- 





368 Willard A. Kerr 


mony, and efficiency. The first of these variables, hourly earnings of 
male workers, is best known of the three for its relationship to labor turn- 
over and to that more emphatic mark of discontent, the strike; however, 
it may be less generally appreciated that women are somewhat less in- 
fluenced by the wage factor in their job stability. Still less, perhaps, is 
the importance of either job monotony or promotion probability recognized 
as potential and partial determinants of a factory department’s turnover 
rate. Yet these data from a rather typical midwestern factory point 
strongly toward such a conclusion. Acceptance of the conclusion in- 
dicates, of course, a need for additional emphasis upon research in monot- 
ony tolerance and determination of ‘minimal promotions” which will act 
as tenure incentives. 

In 1943 it was possible to repeat and expand the same type of cor- 
relational approach in the Camden, New Jersey, electronics factory of the 
RCA Victor Division of Radio Coporation of America. Turnover in 
this research was defined in three different ways, i.e., three different meas- 
ures of turnover were compared; namely, (1) labor turnover, (2) avoid- 
able turnover (i.e. not due to death, pregnancy, or military service), and 
(3) avoidable separation rate. Avoidable separation rate was found to 
be least influenced by sporadic hirings in an expanding work force, and 
to present more coherent coefficients with the other variables studied. 
Data on avoidable separation rate and forty other variables for each of 
fifty-three factory departments cover a period of thirteen weeks or longer 
and represent statistics from approximately 10,000 employees. 

It should be noted that the data on intelligence of supervisors must 
be interpreted with extreme caution due to the fact that only one-fourth 
of the supervisors were tested and it is not certain that the ones tested are 
representative of the ones not tested. 

Eleven of the 40 correlations obtained are statistically significant at 
the five per cent level or better as indicated in Table 2. On the basis of 
these findings it is possible to conclude that in this New Jersey factory 
the departments worst in avoidable separation rate tend to have a larger per 
cent of females among hourly-paid employees, fewer hours worked per 
week per hourly-paid female, lower hourly earnings of hourly-paid males, 
greater sex hours differential, less sex wage differential, less “economic 
goodness,” lower social prestige for average job, greater job monotony, 
a wage incentive system, lower morale (union rating), and shorter job 
tenure. It should be noted that sex wage differential correlates nega- 
tively with turnover simply because a low sex wage differential means low 
average wages for hourly-paid males. Whether any one of the findings 
on earnings, job prestige, job monotony, or incentive wage system should 
be regarded as causal is extremely difficult to determine because of a 





Labor Turnover and Its Correlates 
Table 2 


369 


Tetrachoric Coefficients of Correlation between Departmental Avoidable Separation 


Rate (Labor Turnover) and Each of Forty Other Variables 
in a New Jersey Factory 


Note: N = 53 factory departments, each employing an average of 189 workers. 
Coefficients significant at 5 per cent level are set in bold face type. 








CON AUR We 


. Number of hourly employees 


Number of all employees 
Per cent of employees who are hourly paid 
Amount of supervision (hourly employees + supervisors) 


. Per cent of hourly-paid who are male 


Average hours worked per week by hourly-paid males 
Average hours per week by hourly-paid females 


. Per cent of salary workers who are male 
. Average non-overtime earnings of hourly-paid males 


. Average non-overtime earnings of hourly-paid females 
. Sex hours differential 

. Sex wage differential 

. Intra-company transfer mobility 


. Economic goodness (composite of low turnover, high efficiency, and many 


good employee suggestions) 


. Sex ratio imbalance 
. Per cent of employees who are salaried: male 


. Per cent of employees who are salaried: female 

. Per cent membership in company athletic association 
. Accident frequency 

. Accident severity 

. Productive efficiency (official) 


. Productive efficiency (average rating of ten competent judges) 


23. * Security of average worker’s job 
24.* Average supervisory quality 
25.* Promotion probability 

26.* Social prestige of average job 
27.* Monotony of average job 

28.t Degree of “completion of work” 
29.¢ Fertility of field for suggestions 
30.¢ Suggestion quota 


31. 
. Per cent of suggestion quota met 

. Per cent of suggestions adopted 

. Wage incentive system 

. Average decibel noise level 

. Departmental morale (personnel manager-union average rating) 


SSRSSRESS 


Suggestions submitted 


Morale as rated by personnel manager 

Morale as rated jointly by union president and vice president 
Youthfulness (per cent of employees age 25 and under) 
Tenure (per cent one year or more with company) 


.10 
05 
13 
30 
— .68 
—.13 
— .53 
18 
—.81 
—.13 
38 
— 52 
17 


— .64 
—.11 
02 
—.11 
— .05 
03 
—.20 
—.10 
— .03 
—.27 
—.18 
—.20 
—.39 
47 
—.28 
28 
.24 
—.19 
27 
—.20 
40 
.23 
—.10 
— .02 
— .44 
32 
— .52 





* Variables 23 through 27 each based on an average rating of twelve competent 
judges. 
t Variables 28, 29, and 30 based on plant suggestions supervisor. 











370 Willard A. Kerr 


tendency for such factors to be interrelated, i.e., to be present in undesir- 
able quantities in the same problem departments. 

The validity of the morale data (ratings made by both of the highest 
officers of the union in the presence of the author) is supported by the 
significant correlation between it and avoidable separation rate, so what- 
ever it is that the union leaders think of when they think of morale, it ap- 
parently is at least partly the same thing that employees lack when they 
quit the company for avoidable reasons. Money is a factor in separa- 
tions, but it is not the only factor; in fact two of the important factors, job 
prestige and monotony, seem to be almost entirely psychological. 

Labor turnover was not found to be (negatively) related significantly 
with productive efficiency of departments. Best explanation for this 
rather surprising result appears to be the tendency for discontented work- 
ers in high-turnover departments (which generally are the low-pay depart- 
ments too) simply to terminate employment when thoroughly discon- 
tented, while in the low turnover, higher-pay departments the discontented 
individuals hang on longer because they do have better jobs, but never- 
theless have the net effect of keeping the efficiency of the low-turnover 
departments at a lower ebb than one would expect. 

Likewise per cent departmental membership in the company athletic 
association which charges employees a twenty-five cents per month mem- 
bership fee but also is subsidized by generous grants from the management 
to support its extensive program of dances, plays, and intramural sports, 
fails to show any indication of significant reduction of labor turnover. 
This finding of course does not negate the other obvious values of such 
recreation associations which now are being advanced by Dr. Eastwood 
and the Industrial Recreation Association. 

The significant tendency for departments using an incentive wage 
system to have greater labor turnover may be due in part to a particular 
dislike on the part of less efficient employees for the competitive situation. 
If true, this dislike when translated into turnover statistics looks bad on 
paper but may nevertheless be a “‘survival-of-the-fittest”’ type of puri- 
fying influence that has a net result of maintaining high efficiency in in- 
centive wage departments. The fact already observed in this study that 
high turnover departments are not lower than other departments in pro- 
ductive efficiency (even though the high turnover departments tend to 
have more monotonous work and other penalizing psychological charac- 
teristics), tends to substantiate the implication that incentive wage sys- 
tems are successful in keeping up the productive efficiency of depart- 
ments which possess certain definite psychological handicaps. 

In view of the foregoing comments, the reader probably will not be 
surprised to learn that the plant personnel manager and the union leaders 








Labor Turnover and Its Correlates 371 


(president and vice-president of the CIO local) were in almost total disa- 
greement as to which departments were highest and lowest in job satis- 
faction. Probably the rating definition of each party was made on an 
internally consistant functional basis, but it is found that only the morale 
ratings made by the union leaders actually correlate significantly (in- 
versely) with departmental labor turnover rates. The departmental 
morale ratings made by the personnel manager correlate significantly with 
neither labor turnover (r = — .02) nor with productive efficiency (r = .02). 
These results could be due in part to the fact that one-man ratings gen- 
erally are less reliable than pooled ratings. 

Some relationships which fall a little short of statistical significance 
may be worthy of comment and speculation, particularly the tendency 
of departments with high percentages of young people to have high turn- 
over, and the tendency for departments with most supervision to have 
high turnover. The possible turnover propensity of younger people in the 
plant probably is due to the greater instability of youth, stemming 
partly from the fact that more young people have few or no dependents. 

A tendency for departments with most supervisors per hundred work- 
ers to have high labor turnover possibly may be due, at least in part, to a 
characteristic American reaction—namely, a tendency to rebel when 
supervision and resulting ‘“‘regimentation”’ get too thick and close in the 
work environment. 


Summary 


1. In a study of the relationship between labor turnover and each of 
twenty-four other variables in seven major manufacturing divisions of 
an Indianapolis electronics factory, three variables were found to be 
significantly related to turnover: (a) hourly earnings of male workers; (b) 
job monotony; (c) promotion probability. 

2. A larger study of forty variables in fifty-three departments of a 
Camden, New Jersey, electronics factory revealed statistically significant 
tendencies for departments highest in per cent of workers quitting the 
company for avoidable reasons to be departments which are: 


(a) high in per cent of females among hourly-paid employees; 

(b) low in hours worked per week per hourly-paid female; 

(c) low in hourly earnings of hourly-paid males; 

(d) high in difference in average hours worked per day by each sex; 
(e) low in social prestige for average job; 

(f) with a wage incentive system; 

(g) high in job monotony; and 

(h) low in morale (union ratings). 


Received August 19, 1946. 











The New USES General Aptitude Test Battery 


Beatrice J. Dvorak 
United States Employment Service, Washington, D. C. 


The new USES General Aptitude Test Battery is a combination of 
tests which measures a number of important aptitudes; and it supplies 
information regarding the individual’s possibilities for successfully learn- 
ing job performances in a great many occupations grouped together into 
fields of work. 

This Battery is a product of the Occupational Analysis and Industrial 
Services Division, which has been engaged in job and worker analysis for 
over a decade (2), (3), (5), (6). During this time a large number of bat- 
teries of tests have been developed for the prediction of success in specific 
occupations or small groups of related occupations. These batteries were 
in most instances standardized against a criterion of occupational success 
such as production records, and the Wherry-Doolittle Test Selection 
Method was employed to determine the combinations of tests having maxi- 
mum validity. The batteries were devised for use in the selection proc- 
ess, where attention is focused upon the specific job opening and its re- 
quirements, and the objective is to select the best qualified individual 
from all the available applicants. The employment counselor, on the 
other hand, focuses his attention on the individual and is often interested in 
testing to explore the possibilities of various kinds of work for that person. 


Description of Battery 


The General Aptitude Test Battery consists of 15 tests chosen as a re- 
sult of factor analysis studies of a large number of tests (4). It measures 
10 aptitudes which in varying degrees and combinations contribute to 
occupational success: G—intelligence, V—verbal ability, N—numerical 
ability, S—spatial ability, P—form perception, Q—clerical perception, 
A—aiming, T—motor speed, F—finger dexterity, and M—manual dex- 
terity. Of the 15 tests, 11 are paper-and-pencil tests and 4 are apparatus 
tests: The entire, battery requires about 214 hours for administration. 


Method of Standardization 


The Battery is being standardized on samples of workers employed in 
various o¢cupations. Norms are being developed for groups of occupa- 
372 

















New USES General Aptitude Test Battery 373 


tions established according to similarities in abilities. The standardi- 
zation process begins with a job analysis. The purpose of this is to 
identify the job and to serve as the basis for the selection of the experi- 
mental sample. Persons are then included in the experimental sample 
who are performing the same kind of work, have passed the learning stage 
in their proficiency on the job, and are regarded as satisfactory workers by 
their foremen or supervisors. All such workers in a given occupation in 
any plant are included, whenever possible, to avoid the possibility of a 
biased sample resulting from the supervisor’s selection of either his best or 
his poorest workers for testing. When the number of workers performing 
the same job in one plant is very large and permission cannot be obtained 
to test all of them, attention is given to the selection of a representative 
sample. The standardization program provides for obtaining samples 
of workers in each occupation from more than one locality. The entire 
Battery is administered to each occupational sample. 


Occupational Aptitude Patterns 


Norms have now been developed for 20 fields of work representing ap- 
proximately 2000 occupations. These norms are expressed as Occupa- 
tional Aptitude Patterns and consist of minimum aptitude scores re- 
quired for occupations grouped according to the Part IV classification 
code structure of the Dictionary of Occupational Titles. Those occupa- 
tions which require similar aptitudes have been grouped together into 
the same fields of work. The Occupational Aptitude Patterns were es- 
tablished after analysis of test data and job analysis schedules showed 
that certain occupations required a similar minimum amount of the same 
combination of aptitudes. Each Occupational Aptitude Pattern consists 
of minimum scores for only the most significant aptitudes required for the 
group of occupations covered by that pattern; scores on the other apti- 
tudes measured by the Battery generally would not add materially to the 
predictive value of the aptitude scores for a given pattern. The mini- 
mum aptitude scores are expressed in terms of standard scores with 100 
as the general population average and a sigma of 20. Each minimum 
score is set at 85, 100, 115, or 130 at the point which will eliminate approxi- 
mately the lowest third of the standard scores of the employed sample. 

Some illustrative patterns are as follows. Pattern 1 is GV—intelli- 
gence and verbal ability; the critical scores are 130 for G and 130 for V. 
Kinds of work which require this pattern of aptitudes are occupations in 
creative writing, translating, copy writing and journalism. Pattern 2 is 
GN— intelligence and numerical ability,with critical scores of 130 for 
each. This pattern characterizes occupations in accounting and related 
work. Pattern 4 is GNSF with critical scores of 100 on intelligence, 





374 Beatrice J. Dvorak 


numerical ability and spatial ability, and 85 on finger dexterity. Occupa- 
tions covered by this pattern are those in all-round metal machining and 
all-round mechanical repairing. Pattern 12 is NQTF, with critical scores 
of 100 for numerical ability, clerical perception, and motor speed, and 85 
for finger dexterity. This pattern of abilities is required for a group of 
occupations involving the operation of clerical machines for computa- 
tional purposes. Pattern 13 is NSM with a critical score of 85 for numer- 
ical ability, spatial ability, and manual dexterity. This pattern of abil- 
ities is represented in groups of occupations involved in heavy metal 


structural work, in plumbing and related work, and in wood structural 
work. 


Individual Aptitude Profile 


After the General Aptitude Test Battery has been administered to a 
counselee, his scores are expressed as an Individual Aptitude Profile. 
This consists of 10 aptitude scores, each of which is obtained from one 
or more of the 15 tests in the Battery. The score for intelligence (G) for 
example, is obtained from 3 tests; the verbal ability (V) score is obtained 
from 1 test; and the numerical ability (N) score from 2 tests. Tables 
have been established for converting each raw test score into a weighted 
standard score for an aptitude. The weights are based on the factor 
loadings of each test and were obtained by the Wherry-Doolittle method. 
Thus they represent the significance of each of the tests in measuring a 
given aptitude. The aptitude score is the sum of the weighted standard 
scores for each of the tests measuring that aptitude. 


Matching Profiles with Patterns 


The Individual Aptitude Profile is compared with the 20 Occupational 
Aptitude Patterns to determine the fields of work that are most suitable 
for the person’s abilities. The mechanics of matching is simple. It in- 
volves the use of 2 cards. One of these shows the individual aptitude 
profile with the aptitude scores recorded in a row at the bottom of the 
card. The other card contains a row of minimum aptitude scores for 
each of the 20 Occupational Aptitude Patterns. The profile card is then 
placed on the pattern card so that the aptitude scores on the former fall 
just above the row of corresponding aptitude scores for Pattern 1. By 
inspection it is determined whether the appropriate aptitude scores made 
by the individ:al are lower than, equal to, or higher than the correspond- 
ing minimum aptitude scores for that pattern. If the individual’s scores 
equal or surpass all scores which make up that aptitude pattern, this fact 
is indicated by encircling the number ‘‘1”’ in the box provided on the profile 
card. These steps are repeated by moving the profile card down one row 





New USES General Aptitude Test Battery 375 


at a time until the individual’s aptitude scores have been compared with 
each of the 20 patterns. 

This procedure avoids certain difficulties that have been recognized 
in the use of occupational ability patterns of a graphic type (1). In the 
usual instance, these have been constructed by charting the average 
standard scores made by an occupational group on a standard battery of 
tests. The process of comparing an individual’s graphic profile of test 
scores with a large number of uch occupational patterns is very cumber- 
some. Moreover, it is quite subjective. The counselor often has diffi- 
culty in deciding how similar the individual’s profile should be to the 
average scores in the pattern and sometimes attaches too much signifi- 
cance to the correspondence in the shape of the profile and pattern re- 
gardless of discrepancies in amounts of abilities. The critical scores pro- 
vided for the USES Battery obviate this difficulty. The graphic patterns 
have usually shown the scores of an occupational sample on all the tests 
of a standard battery. The counselor must then subjectively determine 
the relative weights or degrees of importance that should be attached to 
scores from the various tests. The USES patterns are comprised of crit- 
ical scores on only those aptitudes which are significant for a particular 
field of work. 

Use of Battery 


The USES General Aptitude Test Battery has received limited use 
thus far in local employment offices throughout the country. This use, 
however, has been carefully observed by trained technicians in order to 
develop information which can be passed along to other offices which will 
soon be using the battery. It is expected that all local offices which have 
adequate testing facilities will be using these tests to assist them in both 
selection and counseling. 

Such use as has been made of the battery indicates that it is filling a 
very real need. In the past, whenever it was found necessary to test an 
individual for counseling purposes, a number of available tests were se- 
lected for administration which tapped a number of aptitudes. Since 
these tests were related to specific occupations, however, it was not possible 
to test an individual for more than forty or fifty occupations even if two 
or three entire days were spent in testing. The General Aptitude Test 
Battery, however, makes it possible to obtain information about an in- 
dividual’s aptitude for several thousand occupations in a little more than 
two hours of testing. 

This is not to say the battery does not have limitations. One limita- 
tion is that the ten aptitudes measured do not include such important 
traits as artistic aptitude, musical aptitude, eye-hand-foot coordination, 
etc. Many occupations require these aptitudes and dexterities, and ad- 











376 Beatrice J. Dvorak 


ditional tests will no doubt need to be added to the battery to make it 
more applicable. A second limitation of the battery is that it does not 
cover all of the jobs existing in American industry today. More research 
is necessary on the development of additional occupational aptitude pat- 
terns and on the allocation of additional occupations to th: patterns al- 
ready established. This research involves not only the administration 
of the battery to large groups of workers, but also additional analyses of 
jobs, as well as considerable data analysis required for establishing proper 
critical scores for occupations. 

Even at its present stage of development, however, the battery pro- 
vides useful information which has hitherto been impossible to obtain in 
such a short time. Because of its efficiency, we find it necessary to cau- 
tion those who use it not to ignore a consideration of the individual’s 
physical capacities, hobbies, interests, casual work experience, personal 
traits, education, special training, and other factors involved in deter- 
mining the final choice of an appropriate vocation. In the counseling 
situation, tests must be regarded as one of the tools at the disposal of the 
counselor. When used with other tools available to the counselor, it 
increases the chances for providing the counselee with a more accurate 
description of his vocational possibilities. 

Received April 8, 1947. 
Early publication. 


References 


1. Dvorak, B. J. Differential occupational ability patterns. Minneapolis: The Univer- 
sity of Minnesota Press, 1935. 

2. Shartle, C. L., Dvorak, B. J., and Associates. Occupational analysis activities in 
the War Manpower Commission. Psychol. Bull., 1943, 40, 701-713. 

3. Shartle, C. L., Dvorak, B. J., Heinz, C. A., and others. Ten years of occupational 
research, 1934-1944. Occupations, 1944, 7, 387-446. 

4. Staff, Division of Occupational Analysis, War Manpower Commission. Factor 
analysis of occupational aptitude tests. Educ. Psychol. Measmt., 1945, 5, 147- 
155. 

5. Stead, W. H., and Masincup, Earl. The occupational research program of the United 
States Employment Service. Chicago: Public Administration Service, 1941. 

6. Stead, W. H., Shartle, C. L., and Associates. Occupational counseling techniques. 
New York: American Book Company, 1940. 








Test Scores and Efficiency Ratings of Machinists * 


Rose G. Anderson 
Psychological Corporation, New York City, N. Y. 


A project in the Woodward Governor Company which involved the 
vocational counseling of 85% of all plant members, 1184 individuals, 
afforded an unusual opportunity to do broad range testing and to secure 
comparative supplementary personal data through interviews, personnel 
records, and supervisors’ ratings. 

The present report is on an analysis of the test results for 174 machin- 
ists in comparison with supervisors’ ratings of four job traits, quality of 
performance, quantity of performance, rate of learning, and job know- 
ledge. 

The group was carefully selected to include only those employees 
doing jobs involving machine skills—all types of milling machines, grind- 
ers, lathes, honers, hobbers, borematics, jig bores, automatic screw ma- 
chines, diesel assembly, etc. 

The tests studied inciude the Adult Placement Test, a recently devel- 
oped test by the writer, which yields separate verbal and number scores. 
The test has been designed for use in selection and placement in industry 
especially of white collar personnel. It is used in this study as a measure 
of general intelligence or mental alertness. In support of this the Pearson 
correlation coefficient of the total score on the test with the American 
Council on Education Psychological Examination, 1940 edition, for 83 
industrial employees is .82. The correlation with Form 8 of the Army 
Alpha for 208 nursing school applicants is .78. This test is designated 
as A-P in the tables. 

The other standard tests used were the Bennett Mechanical Compre- 
hension Test, Form AA, the Revised Minnesota Paper Form Board Test, 
and the MacQuarrie Mechanical Aptitude Test. These tests are de- 
signated as B-AA, M. Fm. Bd., and Mac. Total, respectively, in the 
tables. Parts of the MacQuarrie Test have been analyzed sparately. 
The score for tests one to three inclusive and the combined scores for 
tests four and five are designated as Mac. (1-3) and Mac. (4-5), respect- 


* Read at the American Psychological Association meetings, Philadelphia, Septem- 
ber 4, 1946. Olive Bray, Enid Wagner, and Wilbur King assisted in the statistical 
analysis of results. 


377 





378 Rose G. Anderson 


ively, in the tables. The former include three tests which primarily in- 
volve manual dexterity and fine eye-hand coordinations. The latter two 
tests involve two dimensional space relationships. (The part scores for 
the MacQuarrie have not been divided by three, as is done in arriving at 
the final score.) ‘ 
In addition, two tests previously developed for plant selection pur- 
poses were also included in the study. One of these is a trade information 


Merit Rating Form 
G 8 


Present Job O. D. Grinder 





Name R. O. 
Clock No. 1345 


Instructions: 

1. Read the statements following each factor. Then place an X in one of the small squares over the 
statement which most nearly fits the employee. If the statement satisfactorily fits the employee, 
place the X in the right square. If he does not quite measure up to the statement but is definitely 
better than the next lower grade, place an X in the left square. 

2. Make your rating with great care. Do not allow personal feelings to affect your rating. 

3. Consider one factor at a time. Try not to let your over-all impression of the person govern your 
rating of any one factor. 














Below Average 


oO & 
Frequently fails to 
meet quality stand- 
ards. Frequent errors 
or rejected work. 


Average 


O O 
Usually meets quality 
standards. Rejected 
work or errors not 
frequent. 


Above Average 


oO O 
Maintains high quality 
standards. Rejected 
work or errors very 
rare. 





. Quantity of 


work. 


O O 
Below average in pro- 
duction. Barely ‘‘gets 
by.” 


& oO 
Average or ‘‘run-of- 
the-mill”’ producer. 


oO 0 
A “‘top producer.” 
Exceptionally high 
production rate. 





. Job 
Knowledge 


O O 
Knowledge of job 
limited. Frequently 
needs assistance. 


& O 


Knows his cwn job 
well. Seldom needs 
assistance. 


O oO 
Well informed on his 
own job. Needs no 
assistance. 





. Cooperative 
Attitude. 


O oO 
Cooperates only when 
he has to. Not very 
cooperative. 


& O 
Meets others halfway. 
Usually willing to 
cooperate. 


O oO 
Exceptionally good 
team worker. Goes 
out of his way to 
cooperate. 





oO 
Very slow to absorb. 
Has to be nursed 
along. 


® oO 
Requires average 
amount of instruction. 


oO 0 
Exceptionaily quick to 
grasp explanations, 
learn new tasks. 





oO & 
Seldom or never as- 
sumes leadership when 
with a group. Others 
rarely or never accept 
his guidance. 


0 i) 
Often assumes leader- 


ship when with a group. 


Others usually accept 
his guidance. 


O Oo 
Usually assumes lead- 
ership when with a 
group. Others nat- 
urally turn to him fer 
direction. 





T. O. 
G. B. 





6 mos. or less. 


Fie. 1. 


Title 


Dept. Mgr. 


Supervisor 





7-12 mos. 


Sample of rating scale. 


1 year or more. 





Test Scores and Efficiency Ratings of Machinists 379 


test adapted from material originally developed by others. The other 
is a three dimensional paper and pencil block test adapted from standard 
tests of this nature. These are designated as “Job Knowledge” and 
“Blocks” respectively in the tables. 

An unusual degree of intelligent interest and cooperation were given by 
the supervisors in arriving at trait ratings. The supervisor and the as- 
sistant supervisor independently rated each machinist in conference with 
the psychologist. Then the two supervisors and the psychologist dis- 
cussed and reconciled conflicting ratings. A preliminary orientation 
conference preceded the ratings in every case. In this the possible halo 
effect of favorable or unfavorable general impressions of ability or per- 
sonality and its avoidance in rating specific traits was discussed, In each 
case, descriptions of the different categories (Figure 1) were amplified in 
discussion. Each group of workers was rated on each specific trait at a 
time in order to focus attention on the trait being considered. Repeated 
emphasis was needed on the points that the average worker in the group 
was a “good”? worker—that only those who were distinctly superior to 
the group as a whole were to be rated superior and that only those who 
were distinctly below the group as a whole were to be rated low. As a re. 
sult it is felt that unusually reliable ratings with very satisfactory dis- 
tributions were obtained for both separate traits and the composite of the 
traits. 

The results have been presented in a form to be practically useful and 
easily intelligible. Scattergrams of test scores and ratings for the sepa- 
rate job traits and the composite ratings for the four traits were studied to 
determine the most feasible critical cut-off scores, i.e. those which would 
eliminate the largest proportion of the below average workers while sacri- 
ficing the smallest proportion of superior and average workers. 

The entire group studied had taken the Adult Placement Test, the 
Bennett Test of Mechanical Comprehension, and the Minnesota Paper 
Form Board Test. Decreasing numbers had taken the other tests. 

Tables 1-5 show the per cents of eliminations from the three rated 
groups on the basis of the most advantageous critical test scores. In 
each table the tests which best differentiated the three groups have been 


starred. It will be noted that the critical scores varied from trait to 
trait; also, that the contribution of specific tests varied somewhat ac- 


cording to the trait being considered. 

For quality of performance (Table 1) the Adult Placement critical 
score of 54 differentiates best, with a Job Knowledge score of 64 a close 
second. Either an Adult Placement score of 54 or a Bennett AA score 
of 40 eliminates 11% more of the low group, but also 6% more of the high 
group than the Adult Placement test alone. 





380 Rose G. Anderson 


Table 1 
Supervisors’ Ratings of Quality and Test Scores Showing Per Cents Eliminated 








Ratings of Quality of Performance 





Tests Low Average 2 4 
0 


0 % 





A-P, Sc. 54* 30 14f 
B-AA, Se. 50 16 13 
M. Fm. Bd., Se. 37 32 11 


Mac. (1-3), Se. 89 36 15 
Cases Mac. (4-5), Se. 54 19 27 
Mac. Total, Sc. 59 16 


28 
119 Job. Kn., Se. 64 35 
Cases 


120 Blocks, Sc. 34 
Cases 


174 A-P, Se. 54 or 
Cases B-AA, Sc. 40 20t 


119 A-P, Se. 54 or 
Cases Job Kn., Sc. 64 67 47 20 


119 B-AA, Sc. 40 or 
Cases Job Kn., Sc. 64 57 38 23 





* Cut-off scores differ for ratings for separate traits as shown in Tables 1, 2, 3, 4, 5, 
and 6. Tests which showed the best group differentiation in each of the tables are 
indicated by a dagger in the last column of each table. 


For quantity of production (Table 2) an Adult Placement critical 
score of 64 provides the best differentiation, but eliminates a larger pro- 
portion of the two upper groups than is desirable. The Minnesota Paper 
Form Board critical score of 37 is next most effective. It might be antic- 
ipated that the Bennett AA and the Job Knowledge Test would not show 
significant relationships with output. It might be expected, however, 
that the MacQuarrie test would show a more direct relationship because 
of its emphasis on manual dexterity and speed. 

For rate of learning (Table 3) aun Adult Placement score of 54 differ- 
entitates well between the extreme groups but sacrifices more of the mid- 
dle group than is desirable. The MacQuarrie differentiates the three 
groups well but eliminates a rather small proportion of the low group. 
A Minnesota Paper Form Board score of 40 is the next most effective. 
For this trait the best elimination by two tests is on the basis of either a 
critical score of 54 on the Adult Placement or of 84 on the MacQuarrie 
(1-3). 





Test Scores and Efficiency Ratings of Machinists 381 


For job knowledge (Table 4) a Job Knowledge Test score of 64 dif- 
ferentiates best. A Bennett AA score of 43 is next most effective, but 
eliminates a rather high per cent (23) of superior workers. 

Comparisons for the cumulative ratings for the four job traits in 
question (Table 5) show the best differentiation through an Adult Place- 
ment critical score of 59. A Bennett AA score of 39 and a Job Know- 
ledge score of 59 are next most effective. The two lower groups, however, 
are not differentiated by the Bennett AA, nor the two upper groups by the 
Job Knowledge Test. The Adult Placement and Bennett combined elim- 
inate 7% more of the low group but also 3% more of the middle group 
than the Adult Placement alone. 


Table 2 
Supervisors’ Ratings of Quantity and Test Scores Showing Per Cents Eliminated 








Ratings of Quantity 





Tests Low Average a 
% % 0 


A-P, Sc. 64 52 38 28t 
B-AA, Sc. 46 47 58 43 
M. Fm. Bd., Se. 37 34 22 17t 


Mac. (1-3), Sc. 94 37 40 45 
Cases Mac. (4-5), Sc. 54 30 43 20 
Mac. Total, Sc. 63 46 48 24 


119 Job Kn., Se. 64 32 31 26 
Cases 





120 Blocks, Se. 39 34 43 26 
Cases 


174 A-P, Sc. 54 or 
Cases B-AA, Sc. 40 46 34 30 


149 A-P, Sc. 64 or 
Cases Mac. Total, Sc. 49 58 48 27 





In order to rule out the factor of selection in the test comparisons 
further analysis was made of the composite efficiency ratings for the 149 
machinists who had taken the Adult Placement, Bennett Mechanical 
Comprehension, Minnesota Paper Form Board, and the MacQuarrie Test 


for Mechanical Ability. Table 6 shows these results for the separate tests 
and for combinations of tests. Since the MacQuarrie (1-3) scores differ- 


entiate slightly better than the MacQuarrie (4-5) or the MacQuarrie 
total scores, the former are used in the test combinations. Although the 
trends are essentially the same, the actual per cents of eliminations are 
slightly more favorable for both an Adult Placement score of 54 and a 





Rose G. Anderson 


Table 3 
Supervisors’ Ratings and Test Scores Showing Per Cenis Eliminated 








Ratings of Rate of Learning 


Tests Low Average High 
0 % % 


A-P, Se. 54 44 33 13t 
Cases B-AA, Sc. 43 47 28 22 
M. Fm. Bd., Sc. 40 49 37 22t 


149 Mac. (1-3), Sc. 84 19 13 
Cases Mac. (4-5), Sc. 44 10 11 
Mac. Total, Sc. 55 13 5t 


120 Job Kn., Sc. 69 55 23 
Cases 


119 Blocks, Sc. 39 28 26 
Cases 


174 A-P, Sc. 54 or 
Cases B-AA, Sc. 40 


149 A-P, Sc. 54 or 
Cases Mac. (1-3), Sc. 84 


_ 149 B-AA, Sc. 43 or 
Cases Mac. Total, Sc. 55 51 22 


149 B-AA, Sc. 43 or 
Cases M. Fm. Bd., Sc. 40 70 33 


149 M. Fm. Bd., Se. 40 or 
Cases Mac. Total, Sc. 55 54 38 22 


149 A-P, Se. 54 or 
Cases M. Fm. Bd., Se. 40 70 52 29 











Bennett score of 39, the Adult Placement test singly differentiating more 
effectively than any single test or any test combination; the best combi- 
nation of two tests being the Adult Placement and the Minnesota Paper 
Form Board. 

The percentiles corresponding to the cut-off scores are interesting in 
indicating the levels of workers eliminated. The Adult Placement score 
of 54 is at the 28 percentile for the entire plant (1347 cases), at the 29 per- 
centile for the total plant machinists’ group (404 cases), at the 49 per- 
centile for plant workers reporting a 9th grade education (66 cases), at 
the 70 percentile for plant workers reporting an 8th grade education (210 
cases), and at the 80 percentile for plant workers (61 cases) reporting less 
than an 8th grade education. ; 





Test Scores and Efficiency Ratings of Machinists 383 


The Bennett cut-off score of 39 is at the 35 percentile for candidates 
for technical training courses, at the 50 percentile for 12th grade boys, and 
at the 18 percentile for engineering school freshmen. 

The Minnesota Paper Form Board Test cut-off score of 34 is at the 15 
percentile for plant machinists and at the 25 percentile for plant men aged 
26 years and over. On the basis of the published norms a score of 34 is 
at the 30 percentile for 12th grade boys, at the 13 percentile for engineer- 
ing school freshmen, and at the 65 percentile for unselected adult men. 


Table 4 
Supervisors’ Ratings of Job Knowledge and Test Scores Showing Per Cents Eliminated 








Ratings of Job Knowledge 





Tests Low Average High 
0 o7 o7, 


0- 70 /0 





A-P, Sc. 54 37 33 20 
B-AA, Se. 43 42 31 23 
M. Fm. Bd., Se. 40 38 44 27 


Mac. (1-3), Se. 89 37 49 21 
Mac. (4-5), Se. 54 27 46 22 
Mac. Total, Se. 55 21 23 3 


120 Job Kn., Sc. 64 49 33 
Cases 


119 Blocks, Se. 34 39 20 
Cases 


120 A-P, Se. 54 or 
Cases Job Kn., Se. 64 62 56 32 


174 A-P, Se. 54 or 
Cases B-AA, Sc. 43 55 47 33 


107 B-AA, Se. 43 or 
Cases Job Kn., Sc. 64 50 38 27 





A MacQuarrie (1-3) cut-off score of 84 cannot be converted into a per- 
centile but is approximately at the 30 percentile for young men. 
The Pearson correlation coefficients between the composite efficiency 


ratings and test scores (Table 7) confirm the trend of the previous results. 
Further they indicate that the contribution of the number tests of the 


Adult Placement is greater than that of the verbal tests. It is also note- 
worthy that the MacQuarrie Block and Pursuit Tests lower rather than 
raise the correlation of the part scores with the efficiency ratings. In re- 
lation to the other analysis the correlations are less revealing as to the 





Rose G. Anderson 


Table 5 


Composite Supervisors’ Ratings for Four Job Traits and Test Scores 
Showing Per Cents Eliminated 








Composite Ratings for 
Four Job Traits 





N Tests Low Average a 
oO 


o % 





174 A-P, Se. 59 50 38 13t 
Cases B-AA, Sc. 39 29 19 10T 
M. Fm. Bd., Se. 39 40 36 10 


149 Mac. (1-3), Sc. 84 25 23 6 
Cases Mac. (4-5), Se. 54 25 40 12 
Mac. Total, Sc. 59 30 28 6 


120 Job Kn., Se. 59 47 20 
Cases 


119 Blocks, Se. 34 32 24 
Cases 


174 A-P, Se. 59 or 
Cases B-AA, Se. 39 57 41 


149 A-P, Sc. 59 or 
Cases Mac. Total, Sc. 63 59 59 21 


149 B-AA, Sc. 39 or 
Cases Mac. Total, Sc. 63 54 48 15 





practical contributions which the tests can make in employee differen- 
titation. 


Cooperative Attitude and Efficienc:’ Ratings 


Supervisors’ ratings of cooperative attitude and leadership capacity 
were obtained along with the efficiency ratings. Cooperative relation- 
ships between a supervisor and his subordinates might materially influence 
ratings of job efficiency. In order to make the composite job ratings and 
the ratings for cooperation comparable the mean of the composite scores 
for each of the groups has been computed. For both the low and medium 
group, cooperative ratings are higher than the mean efficiency ratings 
(Table 8). For the high group, cooperative ratings are lewer. These 
results indicate that supervisors’ job ratings had not been unduly in- 
fluenced by the employee’s level of cooperation. The Pearson r’s show 
lower correlations for the low and middle groups, +.13 and +.28 respect- 
ively, a correlation of +.40 for the high group, and a correlation of +.52 
for the entire group. These results would seem to indicate that com- 





Test Scores and Efficiency Ratings of Machinists 


Table 6 


Supervisors’ Ratings and Test Scores Showing Per Cents Eliminated by 
Critical Test Scores 
Composite Scores on Four Job Traits, N = 149 Cases 








Supervisors’ Ratings 
(4-9) (10-15) 





6-20) 


(a 
40 Cases 75 Cases 34 Cases 





One specific test 
A-P, Sc. 54 47.5 29.3 5.9t 
B-AA, Sc. 39 27.5 14.6 8.8 
Mac. (1-3), Sc. 84 25.0 22.6 5.9 
Mac. (4-5), Sc. 54 25.0 40.0 11.8 
Mac. Total, Se. 59 30.0 28.0 5.9 
M. Fm. Bd., Se. 34 15.0 18.6 2.9 


One of two tests 
A-P or Mac. (1-3) 55.0 40.0 11.8 
A-P or B-AA 55.0 34.7 14.7 
A-P or M. Fm. Bd. §2.5 37.3 8.8 
B-AA or Mac. (1-3) 45.0 32.0 11.8 
B-AA or M. Fm. Bd. 37.5 26.7 11.8 
M. Fm. Bd. or Mac. (1-3) 35.0 32.0 8.8 


One of three tests 
A-P, B-AA, or Mac. (1-3) 60.0 45.3 
A-P, B-AA, or M. Fm. Bd. 60.0 42.7 
B-AA, Mac. (1-3), or M. Fm. Bd. 40.0 


One of four tests 52.0 


Fail A-P, pass other three 18.7 





Table 7 
Correlations between Efficiency Ratings and Test Scores 








Pearson 
T 





A-P Total Score +.40 
A-P N-Score +.28 
A-P ‘V-Score +.16 
Bennett AA Score +.21 
Minn. Fm. Bd. Score +.22 
Mac. (1-3) +.15 
Mac. (4-5) +.17 
Mac. Total + .06 
Job Knowledge +.21 
Blocks +.31 











386 Rose G. Anderson 











Table 8 
Cooperative Attitude and Efficiency Ratings 
Coop. Mean Score 
Attitude 4 Job Pearson P.E. 
Score Traits r r 
Low group 2.3 1.9 +.13 .104 
Middle group 3.4 3.1 +.28 072 
High group 3.9 4.4 +.40 .097 
Total 3.2 3.1 +.52 .040 





petence and cooperativeness are positively correlated rather than that the 
degree of cooperation of the worker earned him higher efficiency ratings. 


High Test Scores and Low Efficiency Ratings 


In Table 6 it will be noted that 70% of the low group were eliminated 
by “failure” on any three of the tests. This result invites study of the 
30% who “passed” all four tests along with 97.1% of the high efficiency 
group. Such study points up the significant facts that test results are 
not a substitute for personal history and interview data, nor for the crit- 
ical judgment of the personnel or guidance officer, and that supervisors’ 
judgments, however critical, are not the final validity criterion of job 
fitness. These men not only passed all four tests, they scored consis- 
tently high (Table 9). 











Table 9 
Low Efficiency Ratings—“Passing” Test Scores (N = 12) 
A-P B-AA M. Fm. Bd. Mac. (1-3) 
Critical scores 54 39 34 84 
Mean of group 92 48 47 98 
Range of group 70-116 44-57 39-51 89-150 





Supplementary data show that the twelve men in this group are ma- 
ture, well-educated and experienced—median age 36, median years of 
education, 12, median experience in the plant, 24 months.' The inference 
to be drawn is that these men are vocationally misplaced or are unpro- 
ductive because of factors other than ability. 

Complete counseling reports are on file for all twelve of these men. 


The corresponding medians for the eliminated group are age = 34, education = 
11.0, and experience = 24 months. The range in this group is much wider in all three 
respects. 














he 
cy 
ire 
it- 
rs 

ob 


is- 








Test Scores and Efficiency Ratings of Machinists 387 


As might be expected their cases are far from simple. The complete 
evidence indicates that low efficiency ratings are primarily related to per- 
sonality factors in two cases, primarily lack of motivation in three cases, 
and to a combination of both of these in four cases. In three cases the 
explanation is obscure. These men were qualified for their jobs and ex- 
pressed interest in continuingin them. Two of them rated below average 
in cooperation. Two of them had unrecognized stronger aptitudes and 
interest trends in non-mechanical directions. In the third case super- 
visors emphasized the need for more experience. 

The median Bernreuter Personality Inventory percentiles, BI—N, 41; 
B2-S, 49; and B4—D, 42 are inconclusive but reflect no consistent trends 
which might explain low productivity. Of the six cases where person- 
ality factors are significant in acccounting for low efficiency, four are 
suspect on the basis of their Bernreuter scores; two would be judged to be 
stable, self-confident, and not too strongly dominant to respond favorably 
to direction. 

As a part of routine plant employment procedure these men had taken 
the Humm-Wadsworth Temperament Scale. Their classifications on the 
basis of this test range as follows in the different categories: Normal— 
moderately weak to very strong; hysteroid—weak to moderately strong; 
manic— borderline to very strong; depressive—very weak to moderately 
strong; autistic—borderline to very weak; paranoid—very weak to strong; 
epileptoid—borderline to. very weak. 

A follow-up a year later reveals that all of these men have left the 
plant, four of their own volition, seven at the time of the general lay-off 
of 600 employees after V-J day. One was discharged. The composite 
information on their termination blanks is given in Figure 2 on a repro- 
duction of a part of the plant’s termination blank. 

It is significant that the plant would rehire eight of these men, produc- 
tion permitting, that only one of the group is rated below in quality of 
work, and that four are rated below average in attitude. The four the 
plant would not rehire include the one below average in quality and three 
below average in attitude. 

The counseling reports of these men of excellent ability include con- 
siderable interpretation and discussion of emotional and personality fac- 
tors, their early conditioning, possible reconditioning, and implications 
for vocational and social adjustment. The reports bear evidence of the 
desirability for such continuous counseling in the interests of industrial 
efficiency and human salvaging. It is not unlikely that benefits from the 
counseling are reflected in the termination ratings and in the fact that the 
plant would rehire two-thirds of the men, production permitting. 





388 Rose G. Anderson 


Gave notice 3} Laid off. 7 Rehire if possible 
No work 


Quit without Rehire, production 
notice 1? Leave of absence permitting 
Discharged 13 Retired Would not rehire 
Reason for leaving: 

(1)! To enter Army. 

(2)! To do independent electrical contracting. 

(3)? Wife ill—wanted straight day shift. 

(4)? Not interested in his work and has another job. 

(5)* Would not stay at machine, behind rate on most jobs. 


Below Above 
Low Average Average Average High 

Quality of work 1 7 4 

Quantity of work 2 8 1 1 
Dependability 1 5 4 

Safety habits and good 

housekeeping 6 5 1 
Attitude 2 2 5 1 2 


Fig. 2. Termination data for 12 men who received “low” efficiency ratings 
but “passed” the tests. 


Summary 


In summary, an analysis of supervisors’ efficiency ratings of 174 ma- 
chinists and their scores on a number of tests has shown the practical 
contributions which tests can make in segregating machinists according 
to their supervisors’ ratings on four job traits and the composite ratings 
for the four traits. The effectiveness of the different tests varied accord- 
ing to the specific trait. Somewhat contrary to expectations a paper and 
pencil test of general intelligence singly, or in combination with others, 
was most discriminating in segregating these machinists according to their 
efficiency ratings. 

The importance of supplementing both supervisors’ judgments and 
test scores by personal history and interview data has been emphasized 
through the detailed study of twelve men securing high test scores and low 
efficiency ratings. 


Received September 13, 1946. 





A Study of Success and Failure Among Student Nurses 


Irwin August Berg 
University of Illinois 


During the first months of training, student nurses represent a con- 
siderable expense to hospitals because they require a great deal of super- 
visory time. With increased training, of course, the students become 
progressively more proficient and thus more valuable to the training insti- 
tution. Yet in spite of strong financial and other incentives to have as 
many student nurses as possible complete training, the number who drop 
training is exceedingly high. In 1943, there were 41,270 students admit- 
ted to nurses training in the United States, and within a nine-months’ 
period more than 38 per cent had withdrawn from training (10, p. 42). 
This has occurred in spite of a goodly number of competent studies of 
predicting nursing success which sought to reduce the number of failures. 

A major difficulty encountered by such prediction studies is that nurs- 
ing, more than most professions, demands a successful combination of 
practical and theoretical work. The relative importance of these two 
aspects, of course, has varied over a period of years and, probably, from 
hospital to hospital. Studies done about fifteen or twenty years ago (7, 
11, 16, 21, 23) indicated with varying degrees of emphasis that intelli- 
gence, as measured by tests like the Army Alpha, the Binet, or the A. C. 
E. Psychological Examination, was not important for success in nursing. 
Apparently, many nurses of that period felt the same way; for when 
Blazier (3) in 1924 asked 250 nurses to list personal and physical qualities 
which they considered important for nursing success, intelligence was not 
listed by them. It may be significant that the same nurses listed good feet 
as the third most important of ten physical qualities. 

But studies done in the last ten years (4, 9, 22, 24) indicate that in- 
telligence tests do bear some relationship to success in schools of nursing. 
This does not mean, of course, that the tests have materially changed, 
but rather that the entrance and classroom standards have been raised. 
During the past twenty years, the percentage of entering nursing students 
who were high school graduates increased steadily, and by the beginning 
of 1944 it was 100 per cent. An increasing number of hospitals are affili- 
ating with colleges to offer both a college and a nursing degree. Also, 
more attention has been paid in recent years to the quality of classroom 
work offered, and many hospitals now employ as teachers nursing educ- 

389 





390 Irwin August Berg 


ation specialists who have taken college work in education in addition to 
being graduate nurses. Thus the classroom work is being given a steadily 
growing emphasis; and nursing success is being measured currently as 
much by classroom achievement as by the previously paramount practical 
work in the wards. Accordingly, the predictive value of psychological 
tests has risen with this change in emphasis. 

In the present study, psychological test and other data were analyzed 
for 110 student nurses who were admitted to a school of nursing during 
the period 1943-1945. All of these students, in addition to the usual re- 
quirements of health, had graduated in the upper half of their high school 
classes. They had also satisfied the admission requirements of the Uni- 
versity of Illinois because a minimum of 18 hours of college course work 
was taken at the University as part of the regular nursing curriculum. 
After being admitted to training, these students were given a test battery 
composed of the A. C. E. Psychological Examination (1940 edition), the 
George Washington University Series of nursing tests (13, 14, 15), the 
Preference Record (17), and the Multiple Choice Test (12). Comparisons 
were made of the successful students who were still in training with those 
who were dropped because of poor scholarship and with those who were 
dropped because of poor scholarship and with those who quit training 
for other reasons. These comparisons are summarized in Tables 1, 2, 
and 3. 


Table 1 
Test Performance of Successful and Unsuccessful Nursing Students 








Dropped, Quit, Quit, 

In Poor Disliked Other 
Training Scholarship Training Reasons 
Test N = 74 N = 15 N=7 N = 14 





A. C. E. 99.9 77.9 86.9 99.9 
24.1 23.5 22.3 16.9 


Nursing Aptitude 160.4 133.0 145.0 
21.4 16.8 24.4 20.4 


Nurses’ Reading 72.3 60.0 69.6 
9.8 8.5 12.6 9.0 


Nurses’ Arithmetic 33.1 28.8 30.6 
8.9 9.8 10.8 8.8 





Discussion of Results 


Of the original group of 110 nurses who began training, 74 or 67.3 per 
cent were still in the nursing curriculum and were performing their duties 
successfully after almost 19 months. Thirty-six or 32.7 per cent had been 





Study of Success and Failure Among Student Nurses 


Table 2 
Preference Record Scores of Successful and Unsuccessful Nursing Students 








Dropped, Quit, Quit, 
+ In idtiti eee oe 
raining o raining easons 
Preference Record Scale* N = 74 N = 5. N =7 N = 14 





Mechanical Mean 50.0 50.7 58.0 49.9 
8.D. 14.7 15.1 12.9 14.5 
Centile 55 57 73 55 


Computational Mean 28.7 26.6 23.7 35.1 
8.D. 10.6 8.7 6.4 11.7 
Centile 53 44 33 74 


Scientific Mean 63.0 62.8 67.4 67.2 
8.D. 14.1 10.1 19.4 14.6 
Centile 76 76 81 81 


53.4 56.0 51.9 
9.5 15.1 
17 24 16 


48.6 50.8 ‘ 41.9 
15.0 14.4 
46 52 30 


46.0 49.4 
12.9 7.6 15.4 
44 29 25 53 


23.0 22.6 23.0 
10.0 11.4 8.3 8.3 
49 49 60 49 


100.2 103.2 91.1 97.2 
14.2 11.5 13.4 11.7 
87 90 74 83 


51.5 53.0 48.1 55.5 
13.9 13.4 10.4 12.5 
24 27 17 33 





*Centiles taken from norms based on 3863 high school students published by 
G. F. Kuder, 1944. 


dropped or had quit training during the same period. Fifteen nurses 
who were dropped for poor scholarship scored significantly lower statis- 
tically (critical ratios of 3 or more) on three of the four aptitude tests when 
compared with the successful nurses. The single exception was the arith- 
metic test which yielded a critical ratio of 1.6 for the difference between 
the means of the two groups. While it is unlikely that all such academic 





392 Irwin August Berg 


failures can ever be detected by the use of tests alone, in the instance at 
hand about half of the poor scholarship group could have been eliminated 
on the basis of low test performance This reduction in the number of 
failures would have been important when viewed from the standpoint 
of the financial saving to the hospital and the probable humiliation which 
the dropped students would have been spared. 


Table 3 
Comparison of Age, Training, and Physical Data for Student Nurses * 








Dropped, Quit, Quit, 
In Poor Disliked Other 
Training Scholarship Training Reasons 
N = 74 N = 15 N =7 N = 14 





Mean Age in Years 19.2 19.3 18.4 19.2 
Mean Months in Training 18.9 10.2 4.9 13.8 
Mean Weight in Pounds 133.1 121.5 141.0 125.2 
Mean Height in Inches 64.8 63.6 65.1 63.9 





* Age, height, and weight data taken at time of entering training; months in training 
represents the months of training completed at the time of the study. 


But the group of 7 nurses who voluntarily dropped training because 
they disliked it present a different problem which is in part outside the 
realm of tests. The test performance of this group, as shown in Table 
1, is about midway between the successful and the poor scholarship 
groups. While the number of cases in this group is too small for detailed 
consideration, there are several gross trends which may, at least, be ex- 
amined for what they are worth and as deserving of further study with 
larger groups. Compared to the successful group, it will be noted in 
Table 2 that the students who disliked nurses’ training scored 18 centiles 
higher on the mechanical and 13 centiles lower on the social service scales 
of the Preference Record. In Table 3, it will be seen that, although these 
girls averaged eight-tenths of a year younger, they were considerably 
heavier but not much taller than the group of 74 nurses still in training. 
During interviews, these girls complained that nurses’ training was not 
what they had expected it would be. When they were asked what their 
expectations had been, the general tenor of their replies indicated that 
they believed they would be given more professional duties to perform, 
and instead they were assigned work which they found menial and fa- 
tiguing. They seemed to have a kind of Hollywood vocational stereotype 
(1, p. 52) of the nursing profession. Apparently they had expected to 
manipulate various gleaming, complicated instruments while assisting at 
operations. Instead, they made beds and rubbed patients’ backs. 





Study of Success and Failure Among Student Nurses 393 


To just what extreme the unrealistic concept of nursing revealed by 
these students reached is difficult to ascertain because their illusions were 
now thoroughly shattered. Nor is it clear just how the sterotype of 
nursing which they possessed developed. The sources of this unrealistic 
notion are probably to be found in radio serials, movies, and cheap mag- 
azine stories. The plight of these girls is perhaps accurately summarized 
in what Wendell Johnson! calls the IFD sequence—from Idealism to 
Frustration to Demoralization. Being younger, the students who disliked 
training and quit may have had less time for experience in the world at 
large to dissolve their stereotype of nursing. It is also possible that fa- 
tigue was a factor since this group, although younger, averaged 141 
pounds in weight as compared to a range of 121 to 133 pounds for the 
mean weight of the other groups. 

A final question may be raised concerning the interests of this group 
of girls who quit training. Their scores for interest in mechanical activ- 
ities, it will be recalled, averaged 18 centiles higher than those for the suc- 
cessful group. With a much larger group this might be interpreted as 
a factor in increasing dissatisfaction with nursing since they began train- 
ing with the expectation of handling various mechanical devices. While 
only a suggestion, it is perhaps significant in this conection that data now 
being gathered reveal that practicing graduate nurses in specialities in 
which a number of instruments are handled, as anesthesia or surgical 
nursing, averaged scores which were at the 76th centile on the mechanical 
scale of the Preference Record while those in other specialties scored below 
the 50th centile. 

There were 14 student nurses who dropped training after an average 
of 13.8 months in school. Twelve of these students dropped voluntarily 
because of poor health, a desire to marry, or because of home responsi- 
bilities. Two were dropped by the school because of ineptitude in per- 
forming their practical nursing duties. In general, there was nothing of 
significance which distinguished this group from the group of successful 
nurses. Their classroom grades and aptitude test scores were about the 
same as those earned by the students still in training. There were some 
differences in Preference Record scores (Table 2) ; but because of the varied 
reasons for quitting training, it is difficult to observe any relationship in 
them. Scores on the Multiple Choice Test (12) were distributed in about 
the same manner as the scores of the other groups. While there is prob- 
ably no reason why the Multiple Choice Test should reveal any differences 
for any of these groups, it may be noted here that, whether corrected for 
anatomical responses or not, the distribution for the group of successful 
nurses overlapped those made for the other groups. 


1 Johnson, Wendell. People in quandaries. New York: Harpers, 1946. p. 14f. 











394 Irwin August Berg 


It is possible that manual dexterity tests may have some usefulness 
in eliminating some of those who fail in the practical work required dur- 
ing nurses’ training. The two students who were dropped for this reason 
replied, when asked, that they had earlier encountered considerable diffi- 
culty in learning skills requiring motor activity as typewriting. But it 
must be acknowledged that nothing is known about what similar diffi- 
culties may have been encountered by the successful nurses. On the 
other hand, it is not impossible that appropriate personality tests might 
afford some useful information on this point since in some cases, according 
to several nursing supervisors, a kind of schizoid preoccupation with per- 
fection in such simple tasks as bedmaking may result in extremely reduced 
efficiency in ward duties. 

Unless certain major changes occur in the tasks required of practicing 
nurses, it appears to be a reasonable prediction that either the total num- 
ber of new graduate nurses will steadily decrease or the admission stand- 
ards which have been progressively raised during past decades will reverse 
the present trend and become lower. The former is more likely to result 
since Gelinas (10, p. 42f) and others are already recommending admission 
standards which include specific courses as chemistry, biology, physics, 
cooking, and nutrition as well as suggesting the addition of two years of 
college work. In order to understand what major changes are needed it 
is only necessary to examine what Gelinas (p. 26ff) says of personnel pol- 
icies and standards in nursing and nursing education. She points out 
that, while nurses undergo a lengthy training period and are subjected to 
health hazards rarely met in other fields, the pay is relatively low and the 
hours are long. Many hospitals require that full or partial maintenance 
be accepted as part of the salary, and most hospitals pay the same rates 
for night work as for day work. Also, too large a percentage, she states, 
of hospitals require more than 48 hours’ duty a week. 

The 250 nurses in Blazier’s (3) study of 1924 listed good health, endur- 
ance, and good feet in that order as the top three physical qualities neces- 
sary for success in nursing. Probably today, the same physical qualities 
seem to be essential; for 17 per cent of the nursing schools in 1943 required 
53 or more hours per week of their students’ time for ward duties, exclu- 
sive of classroom or study time (10, pl 39). The school with which the pres- 
ent study is concerned, it should be emphasized, is a clear exception in 
that ward and classroom hours combined never exceed 48 hours per week. 

In order to attract the best nursing students, especially if admission 
standards continue to rise, it seems essential that the emphasis upon phy- 
sical activity must be reduced to a point where endurance and good feet 
are no longer two of the three most important physical qualities. Per- 
haps what is needed is an extension of the wartime nurse’s aide positions. 





Study of Success and Failure Among Student Nurses 395 


Professional nurses could then organize and supervise ward work as well 
as performing other necessary functions, but the bathing of patients, re- 
cording of temperatures, and a host of other simple duijies could be left 
to trained but non-professional aides. While making nursing more at- 
tractive, such changes should also permit better service at slightly lower 
cost. 


Summary 


Of 110 students who met high admission standards upon entering a 
school of nursing, 36 or 32.7 per cent had been dropped or had quit after 
19 months of training. Fifteen of the 36 were dropped for poor scholar- 
ship. Data from a test battery revealed that this poor scholarship group 
performed significantly lower on tests of scholastic and nursing aptitude. 
Seven of the students quit because they disliked training. There was 
some evidence that these students had interests different from the group 
of 74 successful nurses and that they had an unrealistic notion of what the 
duties of nurses were. Fourteen others dropped training for various 
reasons such as poor health, home responsibilities, or a desire to marry. 
Little was found in the information available which set this group apart 
from the successful nursing students. It is believed that with progress- 
ively higher admission and training standards, fewer girls will be inter- 
ested in nursing as a career unless the physical demands of general duty 
nursing are lessened by employing nurse’s aides or similar assistants to 
handle many of the routine and physically exhausting tasks. 


Received September 3, 1946. 
References 


. Bailey, H. W., Gilbert, W. M., and Berg, I. A. Counseling and the use of tests 
at the Student Personnel Bureau in the University of Illinois. Ed. and Psychol- 
Meas., 1946, 6, 37-60. 

. Bennett, G. K., and Gordon, H. P. Personality test scores and success in the field 
of nursing. J. appl. Psychol., 1944, 28, 267-278. 

. Blazier, F. E. Investigation of nursing as a professional opportunity for girls. 
School Ed. Bull., Indiana U., 1924, 6, 1-69. 

. Brooks, E. The value of psychological testing. Amer. J. Nurs., 1937, 37, 885-890. 

. Burr, M. The MacQuarrie test for mechanical ability. Amer. J. Nurs., 1934, 34, 
378-381. 

. Douglas, H. R., and Merrill, R. A. Prediction of success in schools of nursing. 
U. Minnesota Studies in Prediction of Scholastic Achievement, 1942, 2, 17-31. 

. Earle, M. G. Intelligence testing of probationers. Amer. J. Nurs., 1923, 23, 
866-869. 

. Faber, M.S. Mental tests and measurements. Amer. J. Nurs., 1928, 28, 265-271. 

. Garrison, K. C. The use of psychological tests in the selection of student nurses. 
J. appl. Psychol., 1939, 23, 461-472. 

. Gelinas, A. Nursing and nursing education. New York: Commonwealth Fund, 
1946. 











396 Irwin August Berg 


11. 
12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


& 8 


Habbe, 8. The selection of student nurses. J. appl. Psychol., 1933, 17, 564-580. 

Harrower-Erickson, M. Multiple Choice Test. Madison, Wisc.: M. Harrower- 
Erickson, 1943. 

Hunt, T. Aptitude test for nursing. Washington, D. C.: Center for Psychological 
Services, 1940. 

Hunt, T. Arithmetic test for prospective nurses. Washington, D. C.: Center for 
Psychological Services, 1940. 

Hunt, T. Reading comprehension test for prospective nurses. Washington, D. C.: 
Center for Psychological Services, 1940. 

Hyman, A., and Dreyfuss, R. How intelligent should our nurses be? Amer. J. 
Nurs., 1930, 30, 490-494. 

Kuder, G. F. Preference Record (Form BM). Chicago: Science Research Asso- 
ciates, 1942. 

MacLean, M. 8. The selection of student nurses and the treatment of failures. 
Amer. J. Nurs., 1932, 32, 1297-1307. 

McPhail, A. H. Psychological tests for selecting nurses at the Rhode Island 
Hospital. Amer. J. Nurs., 1929, 29, 203-206. 

McPhail, A. H. A follow-up on a testing program; a review of data: 1934-1937. 
Amer. J. Nurs., 1937, 37, 890-893. 

Metcalfe, L. Achievement of nurses in relation to intelligence ratings. Proceed- 
ings, Nat. League Nurs. Ed., 1928, 174-175. 


. Potts, E. M. Testing prospective nurses. Occupations, 1945, 23, 328-334. 
. Rhinehart, J. B. An attempt to predict the success of student nurses by the use 


of a battery of tests. J. appl. Psychol., 1933, 17, 277-293. 


. Sartain, A. Q. Predicting success in a school of nursing. J. appl. Psychol., 1946, 


30, 234-240. 


. South, E. B., and Clark, G. Y. Some uses of psychological tests in schools of 


nursing. Amer. J. Nurs., 1929, 29, 1495-1498. 


. Triggs, F.O. Personnel work in schools of nursing. Philadelphia: Saunders, 1945. 
. Williamson, E. G., and others. Testing for nursing aptitude. Amer. J. Nurs., 


1937, 37, 893-895. 


. Young, H. H. Intelligence ratings and success of nurses in training. J. appl. 


Psychol., 1924, 8, 377-390. 





The Construction of a Test of Ability to Repeat 
Spoken Messages * 


John C. Snidecor 
Santa Barbara College, University of California 


and 
Theodore D. Hanley 


Veterans Guidance Center, University of Iowa 


A modern navy ship is maneuvered and fought by remote control. 
The various units of a large battleship are integrated by means of hun- 
dreds of telephone connections. Even a small ship will have dozens of 
telephones interconnecting the bridge with gun stations, the engineroom, 
repair parties, and lookouts. The telephone used at each outlet is sound- 
powered, that is, the energy for producing the electrical current is supplied 
by the voice alone. The sound-powered phone is a rugged, dependable 
instrument, but is relatively low in loudness and fidelity when compared 
with a good electrical telephone. 

Most phone outlets are manned by enlisted battlephone talkers who 
relay messages that originate from various parts of the ship. Usually 
the message originates with an officer, is spoken by a talker, received by 
a talker, and then repeated to the officer who is responsible for acting 
upon the message received. If one link in this chain is broken or faulty, 
important messages may be inaccurately relayed to the danger of ship 
and crew. 

Approximately seventy per cent of the enlisted personnel on a Navy 
ship serve at one time or another as battle phone talkers. At the time of 
the Navy’s greatest expansion, a need was felt and expressed for a test 
which would aid in the selection of men to be assigned to talker billets, 
and also aid in selecting men to receive operational training in the tech- 
nique of battle phone talking. 

The problem of selection was initially met by developing a Speech 
Interview which was designed to rate men in terms of their relative intel- 
ligibility. The Speech Interview as developed by the Applied Psychol- 


* This article, although cleared by the U. 8. Navy for publication, does not neces- 
sarily represent Naval policy or practices. It represents solely the research findings and 
opinions of the authors. 


397 











398 John C. Snidecor and Theodore D. Hanley 


ogy Panel of the National Defense Research Committee! proved to 
satisfy this need. 

It was soon found desirable to implement the Speech Interview with a 
test that would measure the ability of the potential talker to remember 
and repeat exactly the messages he is called upon torelay. Further, it was 
consi‘lered desirable that such a test be combined with the Speech Inter- 
view for ease and rapidity of administration. The writers were assigned 
the duty of constructing a test of the ability to repeat spoken messages, 
called for convenience a Command Memory Test. As with the Speech 
Interview, the objective of the test was the selection of men for training 
or immediate assignment as battle phone talkers. 


Experimental Procedure 


The first step in construction of the test was the collecting of a body of test 
items. From standard sources dealing with messages and commands spoken 
over ships’ interior communication circuits, six lists of thirty m es each 
were prepared. The m es were arranged in apparent order of difficulty of 
repetition on each list, the short messages at the beginning and the longer ones 
at the end. An attempt was made to match the six lists in terms of difficulty. 
Insofar as it was possible through subjective evaluation of the test items, equal 
auanes of easy, moderately difficult, and difficult messages were placed on 
each list. 

After the test materials were assembled, a testing room was set up at one 
of the major training centers. In this room test booths were constructed, ten 
booths on each side of a long mess table. Ten booths on one side of a table 
were equipped with sound-powered phones for the use of the test subjects. 
The ten booths facing these were designed for the use of trained testers.- One 
sound-powered phone, called the master phone, was connected to the bank of 
phones used by the subjects. This phone was set up on a small table in the 
center of the room for the use of a criterion tester. 

Individual testers for the project were selected from a group of enlisted 
men who were service school candidates and temporarily stationed at the naval 
training center. Ten of these seamen making the highest scores on the 
Listening-in-Noise and Digit Memory tests? were assigned the testing duty 
and were instructed in the methods to be used. It was explained to them 
that each tester would be seated opposite a subject on the double bank of test 
booths., The subjects, wearing earphones, would be told by the criterion 
tester to repeat exactly the messages read by him into the master phone. 
Each individual tester would check the correctness of the message repetition 
on a form provided for that purpose. Additions, omissions, inversions, substi- 
tutions, and incompleted messages were to be marked incorrect. The testers 
would score the test by subtracting the number of incorrect items from the 
total number of items on the test form. 

Following instructions to the testers, they were given practice in test 
scoring and then were checked as to scoring reliability. All testers listening 
together marked test forms on each of several successive subjects. Agreement 


1 Report No. 1. A Speech Interview for the Selection of Telephone Talkers, Project 
N-109. Report No.3. A Study in Training Petty Officers to Select Telephone Talkers, 
Project N-109. 

* Tests developed by the Psycho-Acoustic Laboratory, Harvard University. 8. 8. 
Stevens, Director. 





Test of Ability to Repeat Spoken Messages 399 


among testers on total scores for each subject was so close, a substantial 
majority always scoring identically, that it was decided to compare each 
tester’s individual scores with the modal score for each subject. This pro- 
cedure negated the possibility of one or two judgments over-influencing the 
observed trend of the central tendency, which would have been the case had 
the mean score for each subject been selected as the score with which to com- 
pare individual judgments. Further, since total scores always were in whole 
numbers, it appeared logical to select a measure of central tendency that was 
itself a whole number.’ 

The reliability of the testers established,-each of the six lists of standard 
messages was administered to 100 subjects in groups of ten in the manner 
described. The subjects were unselected men undergoing recruit training at 
the naval training center. Messages were read by the criterion tester at a 
constant rate of speed (approximately 140 words per minute) into the sound- 
powered phone circuit. Each subject heard the messages over a set of ear- 
phones and repeated them to a tester seated opposite him. No signal was 
given at the time when the subject was to begin the repetition of a message; 
the cue was the ending of the message plus the inflection of the criterion 
tester’s voice. The reason for this practice was the attempt to make the test, 
insofar as it was possible, a duplication of actual shipboard conditions. The 
intelligibility of the messages as read by the criterion tester was not considered 
a factor in their difficulty of repetition, since in many evaluations of his speech 
by trained testers (in The Speech Interview) he was always rated in one of the 
two highest brackets, ‘““Very Good” or “Excellent.”” In breaking down the 
six lists of messages to the two lists which later came to comprise the Command 
Memory Test, those test items which were checked most and least frequently 
were discarded as being too difficult and too easy. Also, the items which were 
not discriminative were discarded. 

The remaining test items, arranged in order of difficulty, were placed in 
two equivalent 33 item test forms. In succeeding separate sections of the main 
study, Command Memory Test Forms I and II were administered under the 
following conditions: 


A. For a reliability check, Forms I and II were administered successively 
to 100 subjects, Form I preceding II for 50 of the subjects and succeeding II 
for the remaining 50. 

B. In an attempt to gauge the effect of distracting noise on the ability to 
repeat spoken messages, Forms I and II were administered successively to 100 
subjects as described above. Accompanying this administration was the play- 
ing of a battle noise phonograph recording. 

C. Forms I and II were recorded on phonograph discs and the tests were 
administered to 100 subjects, the sound from the record player being fed 
directly into the sound-powered phone circuit. 

D. To ascertain what differences in results, if any, obtain when the test 
is administered by an untrained tester, Forms I and II were administered 
successively by the criterion tester and one of the booth testers. The latter 
was an enlisted man 18 years of age who had had no academic training in 
speech or testing, and had had no specific training in administering the Com- 
mand Memory Test beyond the principles he had learned in listening to 
administrations of the test over a three week period. The trained tester 
administered one form of the test, followed by the untrained tester adminis- 
tering the other form, to 50 men. For a second group of 50 men, the untrained 


+ From time to time, due to the urgency of military assignments, it became necessary 
to replace testers. Replacements always underwent the instruction and reliability 
check described above. 











400 John C. Snidecor and Theodore D. Hanley 


tester administered one form of the test, followed by the trained tester admin- 
istering the other form. 

E. Certain basic data about each of the test subjects were available to the 
writers. The attempt was made to discover what relationships exist between 
the ability to repeat spoken messages and these other data. This section of 
the study included the following: 

1. Scores on Form I for 370 subjects were plotted on a scatter diagram 
against the General Classification Test (a Navy basic aptitude test) scores for 
the same subjects. Scores on Form II for 135 subjects were plotted against 
General Classification Test scores for the same subjects. 

2. Scores on Command Memory Test Form I for 371 subjects were plotted 
on a scatter diagram against the number of school years completed by each 
man. This was done also with Form II scores for 134 subjects. 

3. Scores on Form I for 370 subjects were plotted on a scatter diagram 
against the age of each subject. This was done also with Form II for 133 
subjects. 

F. Command Memory Test Forms I and II were administered to 100 sub- 
jects with the Digit Memory Test prepared by the Harvard University Psycho- 
Acoustic Laboratory. For 50 subjects the Command Memory Test preceded 
a Digit Memory Test, and for 50 subjects it succeeded the Digit Memory 

est. 

G. Command Memory Test Forms I and II were administered to 100 
subjects with the Speech Interview, which rates intelligibility over sound- 
powered phones. For 50 subjects the Command Memory Test preceded the 
Speech Interview, and for 50 subjects it succeeded the Speech Interview. 

H. The difficulty of each item in the two test forms, as indicated by the 
per cent of men missing each item, was plotted on a scatter diagram against 
the length in words of each item. This was repeated, plotting difficulty against 
length in syllables of each item. 


Results 


Testers’ Reliability. Of the final total of 44 testers used in scoring 
the Command Memory Test, 41 were in complete agreement with the 
modal score for individual subjects used in the reliability check on at 
least 50 per cent of the cases. Twenty-one were in complete agreement 
with the mode on at least 75 per cent of the cases. Six were in complete 
agreement with the mode on at least 90 per cent of the cases. 

Thirty-eight of the 44 testers were within two points of the mode on 
100 per cent of the cases. Only one tester deviated from the mode by 
as much as four points, and that on only one case. 

For all the testers the mean amount of difference from the mode, con- 
sidering only those cases where there was a difference, was 1.14. The 
mean amount of difference, considering all cases used in the reliability 
check, was 0.28. 

Assembly of Items for Final Test Forms. As has been stated, the items 
used in the final Forms I and II of the Command Memory Test were those 
remaining after the extremely easy and extremely difficult items had been 
culled from the original 180 items in the six preliminary test forms. The 
easiest item used in the final form was one missed by eight per cent of the 





Test of Ability to Repeat Spoken Messages 401 


subjects. The most difficult item was missed by 96 per cent of the sub- 
jects. The other items were arranged in order of difficulty between these 
two extremes. An equally important criterion used in the selection of 
the items to be used in the finished test was the factor of discrimination. 
To arrive at a rough discrimination index, the writers grouped the sub- 
jects into upper, middle, and lower thirds, in terms of total scores on the 
Command Memory Test. They then determined the per cent of men in 
each third who missed each item. Those items finally selected were ones 
on which there was a significant difference between the per cent of men in 
the upper and lower groups missing the items. The items are as follows: 


Command Memory Test, Form I 


(The following are the standard messages and commands which are read 
by the Criterion Tester and repeated by the test subject.) 


Main deck stations look alive with your lines. 

. Conn—Main Deck Aft—my transmitter is dead. 

. All stations—Bridge—torpedo defense. 

. Bridge—Fog Lookout—destroyers are on station. 

. Central—Conn—reduce port list. 

. Bridge—Starboard Lookout One—submarine surfacing. 

. All guns, match pointers and shift to automatic. 

. Port Chains—Conn—call the First Lieutenant there. 

. Action starboard; train on surface target. 

. Bridge—Port Lookout Two—submarine is returning fire. 

. Central—Repair Four—a heavy explosion felt forward. 

. Plot and Control—Bridge—we are making one-seven knots. 

. Frame one-three-eight starboard is now manned and ready. 

. Central—Repair One—send two stretchers to scene of explosion. 
. Bridge—Starboard Lookout One—destroyer is dropping charges. 
. Bridge—Horizon Lookout—one carrier closing distance. 

. Repair Two—Conn—report to Conn when air line is fixed. 

. Main Control—Bridge—port engines ahead two-thirds. 

. Gun One—ten rounds expended, no casualties, gun empty. 

. Central—Bridge—correct for list by counterflooding. 

. All stations—Spot One—stand by for firing circuit check. 

. All stations—Central—stand by for bomb hits; enemy bombers sighted. 
. Sky Control—Control—the tow plane is making a run to port. 

. Repair One—Damage—test all risers in your area for breaks. 

. Control—Mount One—five torpedoes expended, no casualties. 

. Sky Control—Radar—estimated speed two-seven-five knots. 

. Central—Repair One—fire is spreading to second deck, outboard. 
. Shift to local control; plane bearing zero-six-zero. 

. Central—Patrol One—air line to number six mount is broken. 

. Damage—Central—take charge of Repair Two operation. 

. Turret Two—Spot One—Gunnery Officer is in Conning Tower. 

. Main Deck Aft—Conn—we will pass orders to the tugs over this line. 
. Repair One—Central—surround the fire in Baker one-zero-four. 


DOONAN PONe 


Test Reliability. The decision to use 33-item test forms was arrived at 
after test reliability checks had been made on 40-item, 33-item, 30-item 











402 John C. Snidecor and Theodore D. Hanley 


and 20-item test forms. The 33-item form combined the advantages of 
high reliability and speed of administration. 

Mean score for 100 subjects on Form I was 16.18, standard deviation 
7.89. Reliability, computed by the Kuder-Richardson formula, was .896. 
Mean score for the same 100 subjects on Form II was 15.88, standard 
deviation 7.68, reliability .886. Product-moment correlation between 
the two forms was .892. 

Test Validity. It is apparent that the Command Memory Test has 
high face validity, since the items used are actual messages in frequent 
use by the men who man battle telephones. Also, the equipment used in 
administering the test, sound-powered phones, is identical with the 
equipment used on board ship. 

Command Memory Test under Conditions of Noise. When Forms I 
and II were administered against a background of battle noise, mean 
score for the two forms was 16.30, standard deviation 7.80. For the 
same subjects when the test was administered without noise, mean score 
was 16.47, standard deviation 7.65. Correlation between performance 
on the test with and without noise was .76. 

Command Memory Test, Recorded Administration. Forms I and II, 
when administered by phonograph recordings, proved to be more difficult 
for the subjects than when personally administered. Mean score for 
Form I was 11.33, standard deviation 5.49. Mean score for Form II was 
10.11, standard deviation 5.58. The difference between means when the 
test is administered personally and when it is administered by phono- 
graph recording, averaging 5.31, is statistically significant at the two per 
cent level of confidence. Correlation between the two forms on the re- 
corded administration was .644. 

Command Memory Test, Administration by Trained and Untrained 
Criterion Testers. Forms I and II, when administered by a trained tester 
to 100 subjects,. had a mean score of 16.85, standard deviation 5.29. 
When the forms were administered by an untrained tester to the same 
subjects, the mean was 17.37, standard deviation 5.90. The difference 
between test means when the test is administered by trained and un- 
trained testers is without statistical significance. Correlation between 
scores obtained in the trained and untrained administration was .834. 

Command Memory Test with General Classification Test. Form I, with 
a mean of 14.57, standard deviation 7.33 for 370 subjects, was found to 
have a Pearsonian correlation of .52. with the Navy General Classification 
Test, mean 54.61, standard deviation 10.91. Form II, mean 13.65, stand- 
ard deviation 6.26 for 135 subjects, was found to have a correlation of .40 
with the General Classification Test, mean 55.11, standard deviation 9.99. 

Command Memory Test with School Year Completed. Form I, with the 





Test of Ability to Repeat Spoken Messages 403 


mean and standard deviation stated in ‘‘H’”’ above, was found to have a 
.32 product-moment correlation with number of school years completed, 
mean 11.10, standard deviation 1.94. Form II, with mean and standard 
deviation as stated above, was found to have a .33 correlation with num- 
ber of school years completed, mean 11.18, standard deviation 1.46. 

Command Memory Test with Age. Form I showed a —.04 correlation 
with age of the subjects, mean 19.59, standard deviation 3.86. Form II 
showed a .004 correlation with age for the same subjects, mean 19.21, 
standard deviation 3.29. These correlation coefficients are not signifi- 
cantly different from zero. 

Command Memory Test with Digit Memory Test. Both Command 
Memory Test forms, with a mean of 13.03, standard deviation 6.96 for 
100 subjects, were found to have a correlation of .449 with the Digit 
Memory Test, mean 9.81 and standard deviation 3.46. 

Command Memory Test with Speech Interview (Intelligibility). Both 
forms of the test, with a mean of 15.51, standard deviation 6.36 for 100 
subjects, were found to have a coefficient of correlation of .236 with per- 
formance on the Speech Interview, mean 3.78, standard deviation 1.10. 

Message Length and Difficulty. Message length in words, mean 8.79, 
standard deviation 1.84, was found to have a correlation of .63 with item 
difficulty. The average per cent of men missing each item (used as the 
scale of item difficulty) was 55.22, standard deviation 22.74. Correlation 


of message length in syllables with message difficulty was .70. 


Application of the Test 


Under experimental conditions it has been found possible to combine 
the Command Memory Test, measuring the ability to remember and re- 
peat spoken messages, with the Speech Interview, measuring intelligibility 
over sound-powered phones. This combination is considered by Navy 
officials to be desirable as a time-saving factor and as a ready means of 
determining eligibility for battle phone training or direct assignment. 
The combined tests, now termed simply ‘“Talker Test,” are administered 
as follows: 


1. The subject hears, repeats, and is scored on the messages in the 
Command Memory section of the test. 

2. The subject reads a 100-word printed paragraph. 

3. The subject is required to speak extemporaneously on some simple 
subject, and is then scored on intelligibility. 


Cutting scores have been established to disqualify approximately 50 
per cent of the men who take the test. Men scoring ten or below on the 
Command Memory section or “Very Poor” or “Inferior” on the intelli- 











404 John.C. Snidecor and Theodore D. Hanley 


gibility section are automatically marked ‘‘Not Qualified” on the Talker 
Test. Men who score in the 11 to 20 range in the memory section and 
also “‘Poor” on the intelligibility section also fail the test. 

The Talker Test is now administered at the ‘‘P’”’ (preparatory) School 
level where Navy enlisted men are classified for further talker training 
afloat or ashore. 

A manual‘ has been written describing the nature of speech and of 
memory and the techniques to be used in administering and scoring the 
combined Speech Interview and Command Memory Test: Distribution of 
this manual is to personnel concerned in the centers described above. 


Summary 


To meet the Navy’s need for a test which would measure a man’s 
ability to remember and repeat messages using standard shipboard no- 
menclature, such test to be used as a preliminary to the selection and 
training of battle phone talkers, the Command Memory Test was con- 
structed. The items which made up the test as finally evolved consist of 
standard shipboard messages and commands, taken from official sources. 
These messages were arranged in six comparable lists of thirty items each. 

Repeated administrations of these test lists to enlisted personnel of 
a naval training center who acted as subjects served to indicate which 
items were most discriminative and were best adapted in terms of item 
difficulty to the purpose stated above. The items which proved to be 
most acceptable in these two characteristics were arranged in order of 
difficulty into two equivalent test forms, each of which contains thirty- 
three items. 

Administration of the two test forms under circumstances of con- 
trolled variation resulted in the following findings: 


1. Both test forms have high face validity, since the messages used are 
taken from official sources which record authentic telephone talker pro- 
cedure; thus subjects are tested with the actual materials they will be 
called upon to use in the billet for which they are being tested. 

2. Both Form I, with a Kuder-Richardson r of .896, and Form II, 
with an r of .886, have satisfactory reliability. The two test forms are 
directly comparable, with differences in means and standard deviations 
so small as to be without statistical significance. Pearsonian correlation 
between the two forms is .892, another indication of reliability. 

3. Administration of the Command Memory Test to the accompani- 
ment of battle noise recordings does not result in significant differences 
in scores from those made when the test is administered without noise. 


*A Manual for the Selection of Telephone Talkers, NavPers 16557. “Restricted,” 
and thus not available to the general public. 





Test of Ability to Repeat Spoken Messages 405 


One conclusion to be drawn from this finding is that it is not necessary 
to administer the test under conditions of extreme realism, since results 
are very similar without the sound effects. 

4. When a recorded version of the test forms is substituted for per- 
sonal administration, means and standard deviations of the forms show a 
significant decrement. Under these conditions the two forms correlate 
much lower than when the administration is personal, and hence the forms 
are less reliable. 

5. When the test forms are administered by a man with no previous 
experience in speech or talker testing beyond the amount acquired while 
serving for a short time as a tester, there is no significant difference in 
test score means and standard deviations from those made when the test 
is administered by a trained criterion tester. The conclusion is that with 
a@ minimum of training any enlisted man with speech of average intelli- 
gibility may be qualified to administer the Command Memory Test. 

6. High reliability in scoring the test may be expected with only a 
moderate amount of training for the testers. 

7. The ability to remember and repeat spoken messages is not highly 
correlated with any of the following variables: score on the Navy General 
Classification Test, memory for digits, number of school years completed, 
intelligibility over sound-powered phones, and age. Therefore, the Com- 


mand Memory Test is a measure available to the Navy that will satisfy 
the specific need stated above. 

8. There is a moderately high degree of correlation between the diffi- 
culty of repetition of spoken messages and their length in words and 
syllables. 


Received August 31, 1946. 








Visual Acuity in Relation to Illumination in the Ortho-Rater 


Richard Feinberg and S. Edgar Wirt 


Division of Education and Applied Psychology, Purdue University, 
Lafayette, Indiana 


Educational institutions, industry, and the ophthalmic professions 
have made wide use of batteries of rapid vision tests which provide some 
index to the visual skills of the student, employee, or patient. Such test 
results are used in prediction of the individual’s probable occupational 
success, his need for professional care, and as further cues to the optome- 
trist or ophthalmologist in determining quantitatively the patient’s visual 
problems and progress in vision training. Visual acuity is the visual 
skill most universally tested. 

Traditional devices for measuring visual acuity expose printed slides 
or charts, sometimes illuminated by an auxiliary lamp, but often with 
no provision for the establishment or maintenance of any given standard 
level of illumination. The Ortho-Rater,! in order to achieve maximum 


validity and reliability in the test results, has the illumination source 

together with translucent slides enclosed within the instrument itself. 

_ Uniformity of illumination of the test objects presented to the subject is 
made possible in this manner. 

This raises a significant problem which has broad implications: 
namely, ““To what extent can visual acuity be predicted at one level of 
illumination from visual acuity measured at another level of illumina- 
tion?”’ The answer to this query has bearing on all visual acuity testing 
whether in industry, the school, or the professional office. For instance, 
in many situations the illumination in the test equipment may be differ- 
ent from that which exists at the occupational task itself. Is there not, 
then, considerable possibility of error in establishing minimum occupa- 
tional visual acuity requirements by means of such a test? Since it is 
possible that the amount of light emitted by the illuminant might vary 
from one instrument with a new lamp to another instrument with a lamp 
deteriorated by age, would this not be a source of even further error? 

The writers have attempted to determine in the Ortho-Rater the 
dependence of visual acuity upon illumination in that instrument. It is 
beyond the purposes of this paper to discuss visual acuity in its many 


1 Bausch and Lomb Optical Co., Rochester, N. Y. 
406 








Visual Acuity and Illumination in Ortho- Rater 407 


definitional ramifications. However, for purposes of discussion in this 
paper, visual acuity in the Ortho-Rater is defined operationally as a size 
measure of performance in perceiving form based upon the smallest pat- 
tern of details that is perceptible. In the Ortho-Rater acuity tests this 
measure is determined by identifying correctly the varied locations of a 
checkerboard pattern in a series of targets progressively diminishing 
in size. 

This research is limited to the relationship of visual acuity and illumi- 
nation under specific conditions; namely, in the Ortho-Rater. Seeing 
under these circumstances is photopic in nature. Consequently, no men- 
tion will be made of equally interesting scotopic relationships. 


Apparatus and Procedure 


A stock model Ortho-Rater was utilized in which the usual test slides 
were replaced by visual acuity test slides, providing six visual acuity 
slides for distance (26 feet) and five for near (13 inches). The first slide 
in each series was left at standard brightness. Graduated brightness 
levels in the rest of the translucent Ortho-Rater slides were obtained by 
covering their back surfaces with varying combinations and thicknesses 
of white paper. The relative light transmission of the successive slides 
was established by photometer readings. The relative brightnesses in 
each series were computed from these data. The near acuity test is 
actually brighter than the far test at standard illumination, but each 
natural slide has been assigned a relative brightness of 100. The relative 
brightness values for each series are given in Table 1. Slight differences 
in the contrast in the different slides used may be partly responsible for 
the irregularities that are apparent in the data. 

One hundred students (46 males, 54 females) were used as subjects. 
The mean age was 21. The right eye only of each subject was tested on 
each of the tests, the left eye being occluded (by instrument occluder) 
during the test. 

The Ortho-Rater acuity test items consist of a checkerboard that 
must be discriminated from just gray areas in order to locate it in one of 
several positions. Successive positions are presented in random sequence 
to prevent learning. As a further check, two different slide models with 
different answer sequence were used in the series. The slides were pre- 
sented in rotation, starting with different slides for different subjects, so 
as to neutralize possible effects of learning (which would increase scores) 
or fatigue (which would decrease scores). 

Test questions regarding the locaticn of the checkerboard in each 
target were carefully worded as specified in the bulletin “Standard Prac- 
tice in the Administration of the Bausch & Lomb Occupational Vision 











408 Richard Feinberg and S. Edgar Wirt 














Table 1 
Acuity at Different Levels of Brightness in the Ortho-Rater 
Code Relative Mean 8.D. of 
Slide No. Brightness Acuity Acuity 
Far 
1 F-3 100 9.42 2.02 
2 F-3 56 8.94 1.89 
3 F-4 30 7.86 1.45 
4 F-3 12 7.96 1.82 
5 F-4 9 7.36 1.50 
6 F4 3 6.51 1.55 
Near 
1 N-2 100 11.14 1.61 
2 N-2 46 10.14 1.56 
3 N-1l 20 9.19 1.37 
4 N-1l 10 8.97 1.17 
5 N-1l 4 8.09 1.17 





Tests with the Ortho-Rater.” * The purpose of the test was explained 
to the subject only on completion of his test. Tests proceeded at the 
rate of approximately ten per hour. 

The rotation sequences proceeded as follows. The first subject was 
presented with far slides from one to six in sequence and with near slides 
from one to five in sequence. The second subject was presented with 
far slides two to six to one in order and with near slides two to five to 
one. The sixth subject began his far test with slide six and his near test 
with slide one, and the seventh subject his far test with slide one, his 
near test with slide two. By this means each slide, with its own bright- 
ness level, appeared successively as sixth test, fifth test, fourth test, etc., 
and each test (first test, second test, etc.) included an equal number of 
readings on each slide, or at each level of brightness. 


Results 


Mean test scores on successive tests, as shown in Table 2, lack appre- 
ciable variation, indicating the absence of a learning or fatigue factor. 

Mean visual acuity in the Ortho-Rater decreases with reduced illumi- 
nation in a logarithmic ratio. Table 1 and Figure 1 show this decrease. 
On far acuity, for a decrease from standard illumination to approximately 
half that amount, mean acuity decreased from 9.42 to 8.94, a drop of 
0.48 or 5%. For a decrease from standard illumination (slide 1) to one- 


* Standard practice in the administration of the Bausch & Lomb Occupational Vision 
Tests with the Ortho-Rater. Bausch & Lomb Optical Company, 1944, 9-10. 





Visual Acuity and Illumination in Ortho-Rater 


Table 2 
Mean Visual Acuity on Successive Tests 








Mean Visual Acuity 
Test Far Near 








First 7.88 9.54 
Second 7.93 9.47 
Third 8.06 9.46 
Fourth 7.97 9.58 
Fifth 8.10 9.49 
Sixth 8.10 = 





tenth that amount (interpolated graphically between slides 4 and 5), 
acuity decreased to 7.4, a drop of 1.5 or 16%. 

On near acuity, scores at standard illumination are higher than for 
far acuity, and the decrease is more rapid than for far acuity. For a 
decrease from standard illumination to approximately half that amount, 
acuity decreased from 11.14 to 10.14, a drop of 1.00 or 9%. For a 
decrease from standard illumination (slide 1) to one-tenth that amount 
(slide 5), acuity decreased to 8.09, a drop of 3.05 or 22%. 


MEAN VISUAL ACUITY 








100 50 10 
BRIGHTNESS 


Fig. 1. Changes in visual acuity with decreasing brightness 
in the Ortho-Rater. N = 100. 








410 Richard Feinberg and S. Edgar Wirt 


Data on lumen-life depreciation of 25 watt 115 volt T-110 frosted 
showcase lamps used in the Ortho-Rater indicate only a 13% loss in 
brightness with 300 hours of use at 115 volts, at which time it is recom- 
mended that the lamps be replaced. There may be also a +15% varia- 
tion in brightness with variations of voltage +5 volts from the standard 
115 volts. Any of these variations from standard brightness could be 
expected to produce a change in mean acuity of less than 0.2, in com- 
parison with a standard error or measurement of around 1.0 in these tests. 
Variations of this amount are not clinically significant and probably are 
unimportant for any practical use of these tests. Extreme fluctuations 
due to such causes should, of course, be avoided. 

Intercorrelations of visual acuity at different levels of illumination 
are shown in Table 3. Intercorrelations on the far tests range from .90 
to .71, decreasing generally with greater differences in brightness. On 
the near tests the intercorrelations range from .80 to .59. For brightness 
levels between standard and one-tenth that amount, intercorrelations on 
the far tests would range from .90 to .80 and on the near tests from 
.80 to about .70. 


Table 3 
Intercorrelations of Visual Acuity at Different Levels of Brightness 























Far 
Level of 
Brightness 1 2 3 4 5 
2 .90 
3 81 83 
4 .79 85 80 
5 .80 84 84 .79 
6 .74 .74 .73 71 83 
Near 
1 2 3 4 
2 77 
3 .69 80 
4 aa 69 .79 
5 60 .66 71 59 





These values are not far below the reliabilities for these tests, accord- 
ing to a series of as yet unpublished experiments at Purdue University 
and elsewhere. Lower reliability and lower intercorrelations on the near 
tests are due in part to a smaller “range of talent” on the near tests, as 
shown by the standard deviations in Table 2. 








Visual Acuity and Illumination in Ortho-Rater 


Conclusions 


Intercorrelations of acuity over a range of brightness in the ratio of 
10 to 1 are about as high as could be expected in comparison with test 
reliability. Those individuals who score highest in acuity at one level of 
brightness are, in general, the same ones who score highest at other levels 
of illumination. (It should be iterated here that this applies only to 
photopic levels of brightness, particularly to brightness levels produced 
in the Ortho-Rater with the checkerboard acuity test object.) Acuity 
measured at one level of brightness may legitimately be related to 
performance on a task requiring visual perception at another level of 
illumination. 

It did not, of course, require an experiment such as this one to demon- 
strate that fact. Visual acuity and other tests in the Ortho-Rater have 
been related to success in the performance of many jobs with various 
levels of illumination and other visual working conditions. This experi- 
ment demonstrates one reason for such test validity in various occupa- 
tional conditions. It indicates further that much greater validity could 
not be expected by adjusting the brightness in the Ortho-Rater to ap- 
proximate the brightness of the occupational task with the performance 
of which the test results are to be related. 

Visual acuity requirements established for placement of employees 


on a given job are valid for any reasonable variations in lamp brightness 
between Ortho-Raters. 
Those who are placed on a critical job because they can see better 


would see somewhat better still if the brightness of the task could be 
increased. 


Summary 


1. Visual acuity, right eye, of 100 college students was measured in 
the Ortho-Rater with different levels of illumination covering a range of 
about 33 to 1 for the far tests and 25 to 1 for the near tests. 

2. In this experiment there was no apparent learning or fatigue factor 
to bias the results, and the sequence of tests was varied systematically 
to neutralize any constant error. 

3. Mean visual acuity decreased approximately 16% (far) and 22% 
(near) for reduction in brightness in the ratio of 10 to 1, and 5% (far) 
and 9% (near) for reduction in brightness in the ratio of approximately 
2 to 1 from standard brightness. 

4. Expected variations in brightness of the luminant with age and 
with modest variations in voltage would produce negligible variation in 
acuity measurements. 











412 Richard Feinberg and S. Edgar Wirt 


5. Intercorrelations of acuity at different levels of brightness over a 
range of 10 to 1 are .80 to .90 for the far tests and .70 to .80 for the near 
tests—not much lower than the test-retest reliability. 

6. This accounts for one element in the validity of test results with 
respect to occupational success on tasks involving various degrees of 
brightness. 

7. Much greater validity could not be expected by adjusting the 
brightness in the Ortho-Rater to correspond with the brightness of ordi- 
nary tasks. 

8. These relations were demonstrated only for photopic levels of 
illumination in the Ortho-Rater with the checkerboard test object. 
Similar relations may be true also for other types of acuity tests with 
standardized illumination, but for letter tests, or other shape characters. 
they would need to be demonstrated independently. 


Received May 8, 1947. 
Early publication. 





Reliability of Anecdotal Material in the First Annual 
Science Talent Search 


Harold A. Edgerton 
Ohio State University 


Steuart Henderson Britt 
McCann-Erickson, Inc., New York City 


and 


William B. Lemmon 
Universtiy of Oklahoma 


Scientists and laymen alike realize that this nation has large potential 
human resources not at present utilized or even recognized. The Science 
Talent Search, administered by Science Service through Science Clubs 
of America and supported financially by the Westinghouse Electric Cor- 


poration, is designed to discover and subsidize future contributors to sci- 
entific knowledge and endeavor. Scholarships are awarded each year to 
outstanding high-school students in order to further their potentialities 
as scientific leaders of the future. 

All contestants of the first two years are being followed up annually— 
and all those who won honors in additional years—in order to learn more 
about optimal conditions for the development of scientific skills, as well 
as the personal characteristics and experience associated with them. 

The present article is an analysis of certain aspects of the first annual 
Science Talent Search. Selection techniques have been modified from 
year to year, so that the present (sixth) annual contest hurdles are slightly 
different in form and in relative weights assigned. 

The first annual Science Talent Search was administered to high- 
school seniors throughout the United States in the spring of 1942. Of 
approximately 15,000 credentials supplied to applicants, 3,175 were re- 
turned to Science Service, completed and within the time limit set, to be 
considered for scholarship awards. Of the group completing all the re- 
quirements for entrance into the competition, 300 boys and girls were 
selected as outstanding, and 40 of these were brought to Washington, 
D.'C., to attend the Science Talent Institute and to be considered for the 
top scholarships. 

413 








414 H. A. Edgerton, S. H. Britt, and W. B. Lemmon 


The selection of winners of the Annual Science Talent Search involves 
the use of a series of hurdles. The selection techniques employed in the 
several Science Talent searches have been described by Edgerton and 
Britt.!. A “successive hurdles” technique has been used, basing the hur- 
dles on evidence obtained from a Science Aptitude Examination, a high- 
school transcript, a personal recommendation by teachers, and essay on 
an assigned scientific topic, and (for the top 40) psychological and psy- 
chiatric interviews. 

This paper presents a study of the third hurdle, the recommendations 
filled in by the teachers. This hurdle has beeh studied from the point of 
view of the reliability of scoring, interrelation of the several trait groups 
considered, and the correlation with the other selectors used. 

The first hurdle, the Science Aptitude Examination, was believed to 
be the most valid predictor. In the first year this Examination was es- 
sentially a paragraph reading test of one hundred items, employing sci- 
entific subject matter as the vehicle. 

The second hurdle for those candidates who had the highest scores on 
the Science Aptitude Examination was a compound index based upon the 
student’s relative rank in his high-school graduating class and the number 
of units of science courses which he had taken in high school. These 
selectors were differentially weighted, rank in class being given a weight 
five times as great as the amount of credit in science. 

The third hurdle was based on a series of recommendations filled in by 
one or more of the contestant’s teachers. The information supplied on 
the recommendation blank was essentially anecdotal in form; that is, 
that faculty member in the best position to judge a given student’s fitness 
for a Westinghouse Science Scholarship award was asked to supply specific 
examples of the candidate’s activities which were judged to be pertinent to 
and illustrative of the possession of a number of personal characteristics. 
The recommender was instructed that only specific, objective information 
would be considered, and illustrative leads were supplied for each of the 
ten categories. 

The directions to the teachers for filling in the personal data blanks 
were as follows: i 

“Inasmuch as a ‘general’ recommendation is sey | useless, it is pro- 
posed in Part II that specific evidence be given with examples of the capacities, 
interests, personality, and work habits of the student. In filling in these two 


pages, please be as specific as possible; eg. ‘Has designed and built a flying 
model airplane’; ‘has organized a group to learn radio code and kept the group 


going’; ‘has studied trigonometry on his own, outside of school’; ‘has kept the 


1 Edgerton, H. A., and Britt, S. H. The first annual science talent search. Ameri- 
can Scientist, 1943, 31, 55-68; How science talent search winners are chosen. Science 
and the Future, Science Service,. Washington, D. C., 1943, pp. 112-115; The science 
talent search. Occupations, 1943, 22, 177-180. 








Reliability of Anecdotal Material 415 


apparatus of our physics department in repair’; ‘has made field observations 
and drawn a topographic map of the city park’; etc.” 


The ten categories employed were: Attitude, Purpose, and Ambition; 
Scientific Attitude; Work Habits; Resourcefulness; Social Skills; Cooper- 
ativeness; Initiative; Responsibility; Mechanical Ability; and Special 
Abilities. The specific questions asked for each category follow: 


1. Arrirup—E—PurPposE—AMBITION: Does the student have a purpose and 

a oe apa re What is this program and where is it likely to lead? Is it com- 
atible with his abilities, interests, and previous experiences? Would he carry 
is program through in spite of difficult circumstances? 

2. ScrentiFic AtrtirupDE: What has the student done to demonstrate a 
scientific attitude? Has he designed an experiment by himself? Does he 
“jump to conclusions’? Is he objective about most situations or does he 
react emotionally? Does he discriminate between pertinent and non-pertinent 
evidence in solving a problem? Does he “try it and see’’? 

3. Worx Hasirts: What has this student done or failed to do which demon- 
strates the quality of his work habits? Does he attend to details, finish his 
— on time, stick to the task until it is finished, work steadily at an assigned 
job? 

4. ResourceruLNess: What has the student done to show his resource- 
fulness? Has he let failure stop him? Is he able to adapt his ideas and 
me to new situations? Does he always follow “the way it was done by 
others’’? 

5..Sociat Sxrius: Does this student have the ability to direct others, to 
gain the whole-hearted cooperation of students and associates, to make a 
favorable impression on persons he meets? What has he done by which you 
could judge his social competence? 

6. CoopeRraTIvVeNEss: In what ways has the student worked well with 
others, contributed ideas, done his share of work, listened to suggestions and 
ideas of classmates? 

7. Inrr1ative: Has he introduced new ideas and new ways of approaching 
problems? Is he a self-starter? What has he done to show his initiative? 

8. ResponsrBitity: Can he be trusted with money, property, and confi- 
dential information? Can he be depended upon to complete a task? What 
has he done to demonstrate his ability to take responsibility? 

9. Mecuanicat Aspi.ity: What can the student do with tools and machines? 
What devices or apparatus has he constructed? What experience has he had 
with mechanical things? Can he plan as well as construct apparatus and 
devices? 

10. Specta, Asitities: What are his most outstanding traits? What 
special abilities and skills has the student demonstrated which help him in 
making contributions to science (e.g., glass blowing, facility in mathematics, 
care of laboratory animals, construction of apparatus from odds and ends, etc.)? 


These recommendations of the teachers were evaluated by trained 
raters as to whether or not adequate pertinent information had been sup- 
plied. Each category was scored either plus one or zero, and the total 
recommendation score for each candidate was computed by summing the 
number of categories given credit. 

For the purposes of the present study—an analysis of the anecdotal 
materials in the third hurdle—a large number of the recommendation 
forms were secured and a special rating sheet designed so that each recom- 








416 H. A. Edgerton, S. H. Britt, and W. B. Lemmon 


mendation blank could be scored repeatedly without making any marks 
on the blank itself. A random sample of the recommendations was drawn 
by sexes, rating sheets were inserted, and a group of psychologists con- 
sisting of university staff members and graduate students were asked to 
rate and re-rate groups of fifty blanks. The individual recommendation 
blanks were serially numbered, both individually and in groups of fifty. 
Each rater rated candidates of both sexes. Each sub-group of recon- 
mendations was rated several times, and the ratings series were designed 
to overlap from person to person. A sample of several hundred were also 
rated at least twice by the same judge. Each rater was provided only 
typed directions for his evaluation. These amplified for scoring purposes 
the directions given teachers for filling in the recommendations in the 
Personal Data Blank. 

For each rater, three statistics were computed; his mean rating, the 
variance of his ratings, and an estimate of the reliability of his ratings. 
These are shown in Table 1. The Kuder-Richardson ? Case Four was em- 
ployed to obtain an estimate of reliability. It assumes (a) that the matrix 
of inter-item correlations has a rank of 1, (b) that these correlations are 
equal, and (c) that all items have the same difficulty. 

The reliabilities given in Table 1 show quite a discrepancy between 
the obtained reliability coefficients and those which might be hoped for. 
In general, there is a tendency for the highest coefficients to be associated 
with a large variance and somewhat less with a large mean value of the 
total rating score. If we assume that the several sub-samples were ran- 
dom selections from a larger population, comparisons may be made be- 
tween raters. This would suggest that the more lenient and perhaps less 
careful raters (as evidenced by higher means and variances) tended in 
their evaluation to create artificially a situation which approached the 
assumptions underlying the Kuder-Richardson reliability function. Fur- 
ther, a recommender who was sufficiently critical to avoid halo effect and 
to discriminate fairly clearly, item by item, tended to emphasize the 
independence of the several items with a consequent drop in the inter- 
correlations producing a lower value of r;;. Thus, the assumptions of the 
Kuder-Richardson formula can be approximated only in this situation by 
adding the halo effect. An inspection of the actual ratings provided by 
those evaluators who had the highest r.,’s showed that in many cases 
items were given credit in violation of the letter as well as the spirit of 
the evaluation directions. That is, apparently the halo effect and a ten- 
dency to over-value the materials submitted made for an artificially high 
relationship between the recommendation continua. 


? Kuder, G. F., and Richardson, M. W. The theory of the estimation of test relia- 
bility. Psychometrika, 1937, 2, 151-160. , 








Reliability of Anecdotal Material 


Table 1 
Reliability Coefficients (Kuder-Richardson) for all Raters 








Rater M a 7 jj 





5.20 8.35 .78 
5.86 10.63 .86 
3.08 62 
1.92 .50 
2.31 .76 
3.74 5.83 66 
3.18 62 
5.64 84 
5.88 5.30 .60 
4.20 8.04 77 
4.87 4.88 54 
4.46 8.04 77 
4.20 7.60 75 
2.94 5.30 51 
3.80 7.44 .76 
3.64 6.11 .69 
2.78 4.29 59 
4.40 6.24 : .67 
3.48 5.42 .64 
4.34 5.18 58 
3.26 5.43 .66 
2.68 5.02 .68 
2.82 3.71 -50 
3.40 7.08 76 
3.10 5.36 .67 


— 


~ 


Serene mo rma Qm Se 


N 
O 
P 
Q 
R 
8 
T 
U 
V 
W 
x 
Y 





One source of variation, in addition to variance in rating evaluation 
attributable to judges, is inherent in the items themselves. If item re- 
liabilities alone are to be considered, an attempt should be made to hold 
constant the variance attributable to raters. An approximation of this 
requirement can be made by pooling the judgments of the second eval- 
uators. The great majority of the evaluators who scored investigation 
samples were divided between rate and rerate groups; that is, each 
evaluator was given a group of one hundred personal data blanks which 
was composed of two subgroups of fifty. One of these subgroups had 
already been evaluated, and the other was passed on to another evauatlor 
to be rerated. 

Inspection of the first column of correlations in Table 2 reveals that 
both Item 1, Attitude, Purpose, and Ambition, and Item 10, Special 
Abilities, show correlations between raters below .40. This is consistent 
with the results obtained when raters were varied and emphasized the in- 
consistency of both variables. The defects in these items appear to be 








418 H. A. Edgerton, S. H. Britt, and W. B. Lemmon 


independent of rater idiosyncrasies; and either a revision of the item direc- 
tions, clarification of direction to evaluators, or a discard of these items 
is indicated. On the other hand, the other eight items have high cor- 
relations from rater to rater. 


Table 2 


Consistency Correlations (Tetrachoric r’s) for all Evaluators (Rate-Rerate), for a 
Severe Evaluator, and Between a Severe and a Lenient Evaluator 











Severe vs. 

Category All Severe Lenient 
1. Attitude, Purpose, and Ambition .37 .78 .48 
2. Scientific Attitude .80 91 .65 
3. Work Habits 64 .93 71 
4. Resourcefulness 72 .93 86 
5. Social Skills .76 92 .90 
6. Cooperativeness .79 .90 .63 
7. Initiative 81 90 .84 
8. Responsibility 87 91 .80 
9. Mechanical Ability .60 .90 .62 
10. Special Abilities 36 65 32 
Total Score .90 .90 .73 





At the time the evaluations were made, two of the evaluators agreed to 
evaluate a portion of their sample a second time on separate rating forms 
with an interval of several days between ratings. This technique was 
employed with two samples of one hundred personal data blanks, and it 
was therefore possible to sort. out from the total group the individuals who 
were rated by these evaluators and correlate the responses both on an item 
basis and on a total score basis. Inspection of the second column of 
correlations in Table 2 shows that correlation coefficients larger than .90 
were obtained by this method. This evaluator should be classed as 
severe; the variance and means for his samples were low. The correla- 
tions for single items are of approximately the same magnitude, although 
two of the item correlations are somewhat lower than the correlation be- 
tween composites. This is particularly true in the case of item 1 and 
item 10, and reflects the difficulty or equivocality of these items. 

The third column of correlations of Table 2 shows the correlation be- 
tween the ratings of one severe and one lenient rater. When the sample 
first rated by the evaluator judged most lenient was re-rated by a severe 
evaluator, a correlation coefficient of .82 was obtained when total scores 
were compared; and an average correlation of .76 when individual items 
were correlated. This finding is consistent with the previous ones, namely, 
that some differences obtain between evaluators and that these differences 








Reliability of Ancedotal Material 419 


are greater when measured in terms of individual items than by composite 
scores. The individual consistency of the severe rater, as noted above, 
exceeded .90 on a rate-rerate basis (total scores). Unfortunately, the 
lenient rater was not available for a rescoring of the sample so that a rate- 
rerate coefficient could not be obtained for him. It might be inferred, 
however, that if such a measure were obtainable, it would be somewhat 
smaller in magnitude than that of the severe evaluator, since what is here 
called “leniency” is apparently describable as the presence of the halo 
effect, as well as poor understanding and following of directions. Com- 
bined with these variables, perhaps a manifestation of the halo effect, is a 
large “human sympathy.” The lenient rater in question testified to a 
considerable sympathy for the students, whom it was felt had not been 
“done justice” by the persons making the recommendations. 

Operationally considered, these ‘‘reliability’’ coefficients are quite 
valuable, since they indicate that a well trained conscientious evaluator 
can perform consistently from situation to situation.§jThe “reliability 
coefficients” of the items also provide a measure of their relative clarity. 
At the time original evaluations were made, that is, during the actual 
selection process itself, a correlation of .92 was obtained when the same 
blanks were evaluated by several raters. It seems probable that the 
magnitude of these correlations is in part at least attributable to the 
method of training. In the scholarship awards situation, evaluation was 
done on a “seminar’’ basis, criteria were set up and whenever a difficult 
case arose, it was discussed quite carefully by all of the raters until agree- 
ment had been reached. The raters used in this study, on the other hand, 
operated by themselves and were guided only by the brief typewritten 
direction manual. Under these circumstances, it might be expected that 
one individual’s interpretation, even of the same item, should change 
somewhat since consistency checks and outside discussions were im- 
possible. An informal investigation of the ratings in sequence seems to 
suggest that, for many raters, interpretation and severity change with 
practice. The direction and magnitude of this change is not consistent 
from individual to individual. When the evaluators were questioned, 
some of them confessed to increasing boredom, others to feelings of ex- 
asperation with the recommenders, sympathy with the student recom- 
mended, and varying interpretations of similar material. Evaluating a 
shorter sample or discussing difficult cases might conceivably have gone 
far to remedy this situation. The assumption of motivational consistency 
on the part of the evaluators throughout the sample rated seems to be 
unwarranted. If it may be assumed that each evaluation sub-popula- 
tion is a random sample of the total, the correlations from rater to rater 
can be attributed, in large part at least, to rater differences. 











420 H. A. Edgerton, S. H. Britt, and W. B. Lemmon 


In Table 3 are shown the correlations among the ten items for boys, 
based on the ratings of all raters. Similar data are shown for girls in 
Table 4. 


Table 3 


Interrelationships Between Items on the Recommendation Blank for Boys 
(Tetrachoric r) 











ae > 

3 = 
3 5 8 
: tiga cust melle ial 
a oo or ee a 3 3 
=o ° ae = 2 : . < 

8S « = 5 77) Fa p g 
ss 6CUS S = : a 
S53 8 + . & 5 
$e 2 =) 3 53 on 3 = 
<< RH — 0 x = M 
1. Attitude, 
, and ‘ 
Ambition .29 55 .32 58 .29 35 19 Al 10 
2. Scientific 
Attitude .29 322. 62 24 ~«A*dL m.. &..: 4: 2 
3. Work Habits 55 2 44 a Oe: re rn 2 66 
4. Resourcefulness .32  .52 44 35CtiC 44 .- i ae 
5. Social Skills ms Mm ma 46 326 54 =.88 —.10 
6. Cooperativeness .29 41 .26 41 = .46 41 © 2% ££ 
7. Initiative a6 °° 40) BB ae 8 et 45 40 .20 
8. Responsibility 198.38 -42 2. 1 45 a .. 
9. Mechanical 

Ability Al 20. 2 Se ae, Ae =e 25 

10. Special Abilities .10 .30. .10 38 —.10 02 .20 138 25 
Mean a ae ae (ao ‘2 Se 326 ° 35 8.34 15 





All of the intercorrelations are tetrachoric r’s. Inspection of these 
tables would seem to indicate relative stability of item relationship ex- 
cept for Item 10, Special Abilities; for boys, correlations of this item with 
other items vary from —.10 to +.38. For girls, the correlations for Item 
10 vary from .36 to +.94, with the exception of one correlation cofficient 
of —.03. 

The average intercorrelations of each item with all of the others in 
Table 4 (girls) are fairly uniform, ranging from .40 to 59. 

The intercorrelations should not be thought of as interrelations of the 
item names, but of the objective evidence submitted and accepted under 
those rubrics. The average intercorrelations for each item with all of the 
others in Table 3 (boys) are not quite so high as for girls. Item 10, 
Special Abilities, shows little relationship to the others, While for girls 
that item correlates about as well with the other items as do the other 
nine. This may be due to acceptance of a greater variety of evidence 





Reliability of Anecdotal Material 


Table 4 


Interrelationships Between Items on the Recommendation Blank for Girls 
(Tetrachoric r) 








Attitude, Purpose, 


Ambition 
Mechanical Ability 


Scientific Attitude 
Work Habits 
Social Skills 
Cooperativeness 
Responsibility 
Special Abilities 


Initiative 





. Attitude, 
, and 
Ambition 65 «CY f ; ‘ d 94 
. Scientific 
Attitude : mo MBC P / ; ‘ 52 
. Work Habits A .50 .75 ; d ‘ : P .36 
. Resourcefulness  . 42 75 : 4 é : : 57 
. Social Skills J 5 <0 £57 j af ; . 62 
. Cooperativeness . a || : 7 ; 83 
. Initiative r 52 a am . r ‘ ’ 52 
. Responsibility ; ' foo ae | ‘ q : 61 
. Mechanical 
Ability 31 46 33 3! 48 7% 4 — .03 
. Special Abilities .94 52 36 .57 4.62 83 4.52 «61 
Mean .59 .50 AZ 51 .56 51 51 45 40 55 


Se 
a 





as Special Ability for girls than for boys. Several of the evaluators com- 
mented that they did accept evidence for “10, Special Abilities’’ for girls 
that they would not accept for boys. In fact, for girls, almost any evi- 
dence submitted was given credit, including a large number of items which 
were almost uniformly given no credit when appearing on a boy’s blank. 
Apparently this item was not very clearly understood either by the recom- 
mender or by the evaluator; or it may be, of course, that Item 10 is fairly 
independent of all of the other items. 

There is still another method of determining low an individual item 
performs in terms of its relationship to other items. This is a method of 
internal consistency which involves the correlation of the item response 
with a composite score derived from all of the ratings. 

An inspection of Tables 5 and 6 shows that there is a fairly high uni- 
form relationship between each item and the total score (last column of 
table). Here some differences are indicated between boys and girls, 
although each follows the same trend as the group as a whole. 

Consider the girls first. Item 1, Attitude, Purpose, and Ambition, has 
the lowest correlation of any with the total score. This presumably re- 








422 H. A. Edgerton, S. H. Britt, and W. B. Lemmon 


Table 5 





Biserial Coefficients of Correlation for Boys Between Item Scores on the Recommen- 


dation Blanks and other Science Talent Search Variables 














Correlation of 
Item on Recom- 
mendation Blank Science Total 
with No. of Aptitude Score on 
Students in Test Recommen- 
Item on Recommendation Senior Class Score dation 
1. Attitude, Purpose, and Ambition O01 All 67 
2. Scientific Attitude — .06 13 .69 
3. Work Habits .05 —.01 68 
4. Resourcefulness 05 09 75 
5. Social Skills .03 .09 .73 
6. Cooperativeness 01 12 71 
7. Initiative 12 .07 75 
8. Responsibility —.07 05 .73 
9. Mechanical Ability —.01 10 .69 
10. Special Abilities —.05 .06 39 
Table 6 


Biserial Coefficients of Correlation for Girls Between Item Scores on the Recommen- 


dation Blanks and other Science Talent Search Variables 








Correlation of 
Item on Recom- 





mendation Blank Science Total 
with No. of Aptitude Score on 
Students in Test Recommen- 
Item on Recommendation Senior Class Score dation 
1. Attitude, Purpose, and Ambition 01 01 .62 
2. Scientific Attitude —.11 01 .84 
3. Work Habits .06 10 75 
4. Resourcefulness .07 .05 .80 
5. Social Skills 13 .09 84 
6. Céoperativeness 05 02 74 
7. Initiative 17 .03 81 
8. Responsibility .00 —.01 71 
9. Mechanical Ability 27 15 .66 
10. Special Abilities .02 02 .79 





flects the same defects noted above since the two methods showed pro- 


duce essentially the same findings. 


ilarly low, apparently following the same parallel. 
cients uniformly exceed .70. 

For the boys, the relationship between correlations is approximately 
the same as for the girls, although the absolute magnitudes are somewhat 


Item 9, Mechanical Ability, is sim- 


The remaining coeffi- 





Reliability of Anecdotal Material 423 


smaller. There is consequently less separation of items with the excep- 
tion of Item 10, Special Abilities, which has a correlation of only .39. An 
examination of the recommendation blanks for each sex shows little dif- 
ference in the kinds of evidence presented. It would seem justifiable, 
therefore, to attribute a difference in magnitude of correlation to varia- 
tions in the standards of the evaluators, rather than to differences in the 
contestants themselves. 

It is conceivable that part of the discrepancy in the absolute magni- 
tude of the correlations is due to the varying size of the sample and the 
relative magnitude of the means and standard deviations by sexes. For 
total ratings the girls’ mean was 1.2 higher than the boy’s mean, and the 
girls’ standard deviation about 0.7 greater. These statistically signifi- 
cant differences are difficult to explain without more evidence than is 
available, although it is possible that the differences are inherent in the 
recommendations themselves and therefore represent “real’’ sex dif- 
ferences. 


‘Table 7 


Intercorrelations of Hurdle Variables for Boys 








Science Rank in 
Aptitude High-School Recommen- 
Examination Class dation 





Science Aptitude Examination — .36 18 
Rank in High-School Class 36 — 21 
Recommendation 18 21 





Table 8 


Intercorrelations of Hurdle Variables for Girls 








Science Rank in 
Aptitude High-School Recommen- 
Examination Class dation 





Science Aptitude Examination — 42 .20 
Rank in High-School Class 42 — 13 
Recommendation .20 13 





Tables 7 and 8 show the intercorrelations of three hurdle variables, 
for boys and for girls. The tables indicate low degrees of relationship 
among these variables. This is desirable in that the hurdles reflect dif- 
ferent qualities sought in the contestants. The correlation between the 
same aptitude examination and rank in high school class is the highest. 
This would compare favorably with the relationship usually observed be- 
tween a “general ability” test and rank in class, if there had been the full 











424 H. A. Edgerton, S, H: Britt, and W. B. Lemmon 


range of talent. The contestants as a whole were practically all above 
the middle of their class in scholarship. 


Summary 
Recommendations of a specific anecdotal type, which were used as one 


of the selectors in the First Annual Science Talent Search, were studied 
in some detail. 


1. A random sample of these recommendations was studied in con- 
siderable detail. Using 25 different raters, each rater evaluating 50 re- 
recommendations, the estimated reliabilities of the total rating scores 
ranged from .50 to .86 with a mean value of .67. These were made using 
a function which gives an underestimate. When estimates of reliability 
were made by the correlation of first and second ratings of the same 
blanks, the estimated reliability was .90. 

2. Reliability estimates (rate-rerate) for the ten individual categories 
showed two of the items with coefficients below .40. The remaining 
eight had coefficients between .60 and .87. A well trained severe rater 
showed rate-rerate coefficients of .90 or higher for all but the same two it- 
ems as above. 

3. A well trained conscientious rater can evaluate this kind of anec- 
dotal material with a high degree of consistency. Halo effect, variation 
in valuation of items, inadequate following of directions, and over solici- 
tous human sympathy, appareritly, are chief sources of unreliability. 

4. Two of the item categories seemed unclear, suggesting revision of 
the item or the directions (both have been done). 

5. The recommendation scores show quite low correlation with the 
other selector scores available (Science Aptitude Examination scores, and 
Relative Rank in High-School Class). 


Received September 19, 1946. 





What Does Americanism Mean to the American People? 


Henry C. Link 
The Psychological Corporation, New York City 


This survey is a part of the Psychological Barometer series, the oldest 
nation-wide poll of public opinion and buying habits now in existence. 
Begun in March 1932, these surveys are made four times a year with 
10,000 personal interviews and twice a year with 5,000 interviews. The 
present survey was made with 956 rural interviews in addition to the 
5,000 urban interviews. 

The questions on Americanism were developed in a series of fifteen 
separate tests, or pilot studies, over almost a year’s time. One test was 
made with fifty professors of history in as many universities throughout 
the country. The questions dealt not only with traditional aspects of 
Americanism, but with such recent developments as government housing 
and socialized medicine. In other words, Americanism was treated as a 
dynamic rather than as a static concept. A few of the questions were 
asked in the October 1946 survey and reported. These, together with 
those asked in the present survey, are reported here. The dates in each 
case are specified. 

The urban survey was made with 5,000 interviews, during April 1947, 
by 415 interviewers under the supervision of 130 psychologists and in 134 
cities and towns. It represents a true cross-section of the urban popula- 
tion. Two questionnaires were used, one with one-half the sample, or 
2,500 people, the other questionnaire with the other half of the 5,000 
people. These two sub-samples were comparable by geographic, sex, 
socio-economic, and other criteria. Each question reported in this study 
was made with one or the other of these 2,500 samples. The 956 rural 
interviews were made in 74 rural localities throughout the nation. 

Sampling Method. A modified area sampling method was used. All 
interviews were assigned by the local supervising psychologist by blocks 
and streets in accordance with maps constructed to designate the proper 
socio-economic levels. These maps are made to divide the population 
into four principal groups, the “‘A’’ group consisting primarily of owners 
and executives, the “B” group, primarily white-collar and semi-pro- 
fessional, the “‘C’” group or skilled factory and transportation workers, 
“D” group or the less skilled. About 31% of the sample are union 

425 











426 Henry C. Link 


members. All interviews were made in the home, but only one in a 
family; half were made with women, half with men. 


Compulsory Military Training 
Q. “As you know, there has been much talk about what Americanism means. Is 
compulsory military training in peacetime a good thing for America?” 








Socio-Economic Groups 








Total 
Answers, April 1947. Urban A B C D Rural 
Yes 73% 72% 74% 72% 76% 71% 
No 21 23 21 22 18 20 
Uncertain 6 5 5 6 6 9 
Total Interviews 2500 250 750 1000 500 956 





Self-Reliance vs. Paternalism 


Q. “As you know, houses and apartments are built by the Government and rented 
to people at rents below the actual cost. Is this good for America or bad?” 








Socio-Economic Groups 








Total 
Answers, April 1947 Urban A B Cc D Rural 
Good Americanism 50% 37% 45% 52% 61% 42% 
Bad Americanism 39 51 46 35 27 40 
Uncertain or Don’t Know 11 12 9 13 12 18 





Q. “Which is better for America: (a) to have the Government give free doctor and 


medical service which would be paid for by a tax like the Social Security tax; or (b) the 
present system of medical service?” _ 








Socio-Economic Groups 








Total 
Answers, April 1947 Urban A B Cc D Rural 
Present system better 63% 73% 69% 62% 51% 57% 
Socialized medicine better 30 21 25 31 39 31 
Uncertain 7 6 6 7 10 12 





The Wagner proposal for socialized medicine provides for specific 
taxes to pay for the medical services and therefore it seemed proper to 
include the mention of taxes in the question. To mention the taxes 
needed to pay for social measures is considered by some research men as 
introducing a bias in the question. The Government-subsidized housing 
projects do not call for specific taxes. If taxes had been mentioned in 


this question, the answers might have been less favorable to government 
housing. 





What Does Americanism Mean? 427 


Q. “Which do you think is better Americanism: (a) every man should accept the 
responsibility for getting his own job and a living, or (b) the government should see to 
it that every man has a job and a living?” 








Socio-Economic Groups 





Answers, Oct. 1946 Total A B C D 





Every man should accept responsibility 75.8% 88% 85% 75% 
The Government should see to it 18.2 8 11 ¢ 
Don’t know 6.0 4 4 6 





Private Capitalism vs. State Capitalism 


Q. “Is private capitalism and the profit system good Americanism or bad?” 








Union Non- 


Answers, Oct. 1946 Total A B C D Members Union 





Good Americanism 67.4% 85% 76% 63% 55% 59% 72% 
Bad Americanism 15.4 9 13 i7 18 20 13 


Don’t know 17.2 6 11 20 27 21 15 





Q. “The Government should own and run large businesses such as the railroads, 


telephone and telegraph, life insurance, gasoline companies, etc.; would this be good 
Americanism or bad?” 








Socio-Economic Groups 





Answers, Oct. 1946 Total ! B Cc 





Bad Americanism 68.6% D 64% 
Good Americanism 21.8 26 
Don’t know 9.6 10 





Q. “Which of these would you say were good for America and which bad: Fascism, 
Communism, Labor Unions, Socialism, Advertising?” 








Good Americanism Bad Americanism Uncertain 





Answers, April 1947 City Farm City Farm City Farm 





Fascism 1% 2% 94% 80% 5% 18% 
1 


Communism 3 95 83 4 14 
Labor Unions 61 51 28 33 11 16 
Socialism 15 18 72 55 13 27 
Advertising 91 90 5 4 4 6 











428 Henry C. Link 


Incentives and Privileges 


Q. “No one should be allowed to make or keep more money than $25,000 a year; 
is that good Americanism or bad?”’ 








Socio-Economic Groups 








Answers, Oct. 1946 Total A B C D 
Bad Americanism 65.4% 81% 76% 61% 50% 
Good Americanism 26.5 14 19 30 38 
Don’t know 8.1 5 5 9 12 





Q. “Is it good Americanism for some people to send their children to private schools 
and colleges while others send their children to public schools?” 








Socio-Economic Groups 








Total 
Answers, April 1947 Urban A B C D Rural 
Good Americanism 59% 73% 64% 57% 51% 46% 
Not good Americanism 34 20 30 36 41 42 
Uncertain 7 7 6 7 8 12 





In October 1946, the following question was asked with results 


indicating considerable faith in the concept of differential incentives and 
rewards. 


Q. “Is it good Americanism for some people to have big houses and high incomes, 


or is it better Americanism for all people to have about the same of everything and be 
on about the same financial level?” 








Socio-Economic Groups 








Answers, Oct. 1946 Total A B Cc D 
Big houses, high incomes 62% 79% 75% 60% 39% 
Same financial level 28 15 16 31 46 
Uncertain 10 


6 9 9 15 











What Does Americanism Mean? 429 


Differences Between American Democracy, Communism and Fascism 


Q. “What would you say is the difference between Fascism as it was in Germany, 
Communism as it is Soviet Russia, and Democracy as it is in the United States? For 
instance, which ones have freedom of speech, etc. (for items mentioned below)’’: 








None, 
Answers, Oct. 1946 Fascism Communism Democracy Don’t Know 





Freedom of speech 1% 2.5% 96.0% 3.8% 
Labor unions which are free to 


strike 2.0 5.4 92.4 4.7 
Religious freedom 3.0 6.4 96.9 2.6 
Private ownership of farms and 

industry 9.9 3.5 94.5 4.2 
Racial equality 2.9 14.0 80.7 12.2 
Freedom to save and accumulate 

property 6.9 2.4 95.8 3.7 
Freedom to choose your own job 

and business 4.9 3.7 96.2 3.6 





These answers do not always add up to 100% because sometimes two 
were mentioned. In general, the enormous differences between American 
democracy and the German and Russian forms are well recognized. 
Whether the differences between Russian Communism and German 
Fascism are correct, for instance, in regard to religious and labor freedom, 


is a question. 
Unions and the Closed Shop 
Q. “Is it good Americanism for workers to be allowed to organize into unions?” 








Answers, Oct. 1946 Total Union Members Non-Union 





Yes, good Americanism 82.4% 89% 80% 
No, not good Americanism 9.3 5 11 
Don’t know 8.3 6 9 





Q. “Is the closed union shop where every worker must be a union member before 
he can get or hold a job good Americanism or bad Americanism?” 








Answers, Oct. 1946 Total Union Members Non-Union 





Bad Americanism 67.3% 55% 73% 
Good Americanism 21.3 34 15 
Don’t know 11.4 ll 12 











430 


Henry C. Link 


Racial Equality in Employment 


Q. “Jews and Gentiles, Negroes and whites, all should have an equal chance at any 
job: is that good Americanism or not?’’ 








Socio-Economic Groups 








Answers, Oct. 1946 Total A B Cc D 
Good Americanism 84.1% 85% 84% 83% 86% 
Bad Americanism 11.6 12 12 13 8 
Don’t know 4.3 3 4 4 6 





Q. “The belief that the Negroes are inferior to the whites: is that good Americanism 


or bad?” 








Socio-Economic Groups 








Answers, Oct. 1946 Total A B C D 
Bad Americanism 72.2% 72% 72% 74% 69% 
Good Americanism 17.5 16 19 16 17 
Don’t know 10.3 12 9 10 14 





Where Understanding of Americanism Is Derived 


Q. “Where would you say you have gotten your best understanding of Americanism: 
(a) from courses in school; (b) from magazines; (c) from the radio; (d) from newspapers; 
(e) from books; (f) from the church; (g) from some other source?” 








Answers, Oct. 1946 


Total Per Cent* 





Courses in school 


Magazines 
Radio 
Newspapers 
Books 
Church 
Other 

All 

Don’t know 


37 
19 


Kahl SB 





* The total per cents in this question add up to more than 100 because some people 


gave more than one answer. 


Received June 10, 1947. 
Early publication. 




















The Effect of “Look” and “Read” Directions upon the 
Attention Value of Illustrations and Texts 
in Magazine Advertisements 


E. J. Asher and David Kahn 
Division of Education and Applied Psychology, Purdue University 


In experimental studies of the attention value of portions of magazine 


advertisements, such as that reported by Karslake ' where attention was 
measured by the Purdue Eye-Camera, there is a distinct possibility that 


the directions to the subjects might create a set favorable to one or an- 
other portion of the advertisement, and unfavorable to other portions. 
For example, the directions “look at the advertisements in this magazine” 
might create a set which favors attention to illustrations whereas “read 
the advertisements in this magazine” might result in a greater amount 
of attention to the reading matter in the advertisement. 

This study was set up to determine whether, in an experimental study 
of attention in advertising, the directions to the subjects affect the relative 
attention value of portions of an advertisement. Specifically, this study 
aimed to determine the effect of “look” and “read’’ directions upon the 
relative attention value of illustrations and texts in full page magazine 
advertisements. 


Procedure 


Twelve full page advertisements were selected from current numbers 
of the Saturday Evening Post. Each advertisement consisted of a single, 
unitary illustration which occupied the upper part of the page, and a 
section of reading matter situated below the illustration. Each headline 
was a part of the section of reading material. These advertisements were 
assembled in a dummy magazine in such a way that each one occupied a 
righthand page. A full page of reading matter, without comics or poetry, 
was pasted on the lefthand page opposite each advertisement. These ad- 
vertisements were assembled with outside covers to form a flat magazine. 
Each page was tabbed so the subject could turn it without loss of time. 

The attention value of the illustrations and the texts of these adver- 
tisements was measured by means of the Purdue Eye-Camera.? This eye 


1 Karslake, J. S., The Purdue eye-camera, J. appl. Psychol., 1940, 24, 417-440. 
2 Op. cit. 
431 











432 E. J. Asher and David Kahn 


camera set-up consists of a stand on which the magazine is placed, a half- 
silvered mirror mounted in front of the stand, and an 8 mm. motion picture 
camera mounted slightly forward of the stand and above the position 
of the subject when he is seated in front of the stand. Two 150 watt 
electric light bulbs mounted on either side of the half-silvered mirror 
illuminate the magazine and the subject’s face. The reflection of the 
subject’s eyes and the upper part of his face as he looks through the mir- 
ror at the magazine are photographed by the motion picture camera. 
Enough of the bottom part of each advertisement appears on each frame 
of the motion picture film to identify the page. It is possible to identify 
the part of the page which the subject is looking at by studying the motion 
picture film a frame at a time. A count of the number of frames during 
which the subject’s eyes are fixed on a given part of the page gives a meas- 
ure of the time the subject was looking at that particular portion of the 
page. Itis also possible to determine what part of the page is fixated first. 
First fixations and the total time spent on a given portion of the adver- 
tisement were used as measures of attention value. 

The 81 college students used as subjects in this experiment were di- 
vided into two groups. Each of the 39 subjects in the “‘read’”’ group was 
seated in front of the half-silvered mirror so that he could see the mag- 
azine, and close enough so that he could, by reaching under the mirror, 
turn the pages of the magazine. With the magazine closed and the mirror 
adjusted so that the reflection of the subject’s face could be picked up by 
the motion picture camera, the subject was given the following instruc- 
tions: “Here is a dummy magazine containing some full page advertise- 
ments. I want you to read each advertisement as you come to it just as 
you might do if you were in your room without anything to do but read 
this magazine. Read the advertisements in order, turning one page at 
a time.” The camera was started and the subject was told to begin. 

The 42 subjects in the “look’”’ group were handled in the same way 
except that each one was given the following directions: ‘Here is a dummy 
magazine containing some full page advertisements. I want you to look 
at each advertisement as you come to it just as you might do if you were 
killing time in a dentist’s office by looking through this magazine. Look 
at the advertisements in order, turning one page at a time.” 

The motion picture film on which the subject’s looking responses were 
recorded was analyzed to discover (1) what part of each page was fixated 
first, (2) the number of frames the subject looked at the illustration in 
each advertisement, and (3) the number of frames the subject looked at 
the text in each advertisement. The speed of the camera was such that 
the time spent on each portion of the advertisement could be determined 
by multiplying the number of frames by .4 second. 





Effect of “Look” and ‘‘Read’’ Directions 


Results 


The average number of seconds spent on the illustration in each of the 
12 advertisements under ‘‘read’”’ and “look’”’ directions is shown in Figure 
1. It will be noted that the average time spent on the illustrations is 
higher in some degree for the “‘look”’ group than for the “read’”’ group in 
eleven of the twelve advertisements. In only one advertisement, how- 
ever, is the difference statistically significant. In advertisement no. 3, 
the difference between the averages is 3.14 times its standard error. The 
difference between the averages in advertisement no. 5 is 1.86 times its 
standard error. In advertisement no. 2, the difference is 1.09 times its 
error. The critical ratios in the remaining advertisements are below 


6r 


8 
: 
z 
; 
- 


Y 





4 5 6 7 8 10 
ADVERTISEMENTS 


Fig. 1. Average time spent on illustrations under “look’’ (solid bars) and 
“read” (shaded bars) directions 


1.00. It appears, therefore, that the amount of time spent on these il- 
lustrations is not affected in a significant way by the difference in the 
directions in spite of the rather consistent tendency of the averages for 
the “look’’ directions to be higher than the averages for the “read’’ direc- 
tions. 

It will be noted further in Figure 1 that there is a marked tendency for 
the illustration with the highest attention value under the ‘‘look’’ direc- 
tions to have the highest attention value under the “read’’ directions. 
This tendency is seen in the rank-order correlation between the attention 
values as measured under the two sets of directions. The rank-order cor- 











434 E. J. Asher and David Kahn 


relation coefficient is .80. This marked degree of correspondence between 
the attention values under the two sets of directions tends to indicate 
that the difference in the directions does not materially affect the relative 
attention value of these illustrations. 

The average number of seconds spent on the text in each of the twelve 
advertisements by the “read” group and by the “‘look”’ group is shown in 
Figure 2. It can be seen that the average time spent on the text is higher 
in some degree under the “read” directions in eleven of the twelve adver- 
tisements. In no case, however, is the difference between the average 
for the “‘read” directions and the average for the “look”’ directions statis- 


13F 


2 


TIME IN SECONDS 
- ¥ © 


,» w 
Lt 





' 2 So Sy 6 i a 
ADVERTISEMENTS 


Fic. 2. Average time spent on texts under “look” (solid bars) and 
“‘read”’ (shaded bars) directions 


tically significant. All of the critical ratios are below 1.00. This means 
that the amount of time spent on the texts is not affected significantly by 
the difference in the directions in spite of the consistent tendency of the 
averages under the ‘‘read’’ directions to be slightly higher than the aver- 
ages under the “look” directions. That the relative attention value of 
the texts is not affected by the difference in the directions is seen in the 
rank-order correlation between the attention values of the texts under 
the two sets of directions. The rank-order correlation coefficient is .88. 

Another measure of the effect of the directions on the attention value 
of the illustrations and the texts was obtained by counting the number of 





Effect of “Look” and “‘Read’”’ Directions 435 


first fixations on each under each set of directions. These results are 
shown in Table 1. With 39 subjects in the “read” group and with 12 
advertisements, there were 468 possible first fixations. Three hundred 
and sixty-four of these first fixations, or 77.78 per cent, were found to be 


; Table 1 


Nuniber and Per Cent of First Fixations on Illustrations and Texts under 
“Read” and ‘Look’ Directions 








Number of First Per Cent of First 
Fixations Fixations 








Total Tilustra- Illustra- 
Group Possible tions Texts tions Texts 





“Read” 468 364 104 77.78 22.22 
“Look” 498 383 115 76.91 23.09 





on the illustrations. The remaining 104 first fixations, or 22.22 per cent, 
were found to be on the texts. A glance at the results for the “look”’ 
group reveals that 76.91 per cent of the first fixations were on the illus- 
trations, while 23.09 per cent were on the texts. 

It will be noted that the per cent of first fixations on the illustrations 
is essentially the same for both groups. The per cent of first fixations 
on the texts is likewise essentially the same for both groups. Here, as 
in Figures 1 and 2, it is clear that the attention value of the illustrations 
or the texts is practically the same regardless of whether the subjects are 
instructed to “read the advertisements” or to “look at the advertise- 
ments.” This fact suggests that these subjects have well-formed habits 
of looking at a magazine, and that these habits are relatively unaffected 
by the nature of any verbal directions. It is probable that any kind of 
verbal direction would serve merely to set off these previously formed 
habits. 


Summary and Conclusions 


1. This study attempted to determine the effect of “read’’ and “look”’ 
directions upon the attention value of illustrations and texts in 12 full 
page magazine advertisements. 

2. The attention value was measured in terms of (1) the average 
amount of time spent in looking at the illustration or text in each adver- 
tisement, and (2) in terms of the number of first fixations on the illus- 
trations and text in each advertisement. 

3. In general subjects spend as much time and make as many first 
fixations on illustrations under “read”’ directions as they do under “‘look”’ 
directions. 














436 E. J. Asher and David Kahn 


4. In general subjects spend as much time and make as many first 
fixations on the texts of advertisements under “look” directions as under 
“read” directions. 

5. The rank-order of the attention values of the illustrations or texts 
of the advertisements remains essentially the same under the two sets of 
directions. 

6. It is concluded that “read” and “look” directions have no appreci- 
able differential effect upon the attention value of illustrations or texts 
in full page magazine advertisements. 

Received April 22, 1947. 
Early publication. 


Women Students in Liberal Arts, Nursing, and Teacher 
Training Curricula and the Minnesota 
Multiphasic Personality Inventory 


Orpha Maust Lough 
Skidmore College, Saratoga Springs, N.Y. 


With the increased emphasis placed on specialization, educational 
and vocational guidance has become more urgent and is being given 
greater prominence than ever before in both high schools and colleges. 
Since greater attention is also being given to job analysis and to finding 
the right position for the person as well as the right person for the job, 
there has come the realization that factors other than intelligence are re- 
quisite for effectiveness, efficiency, and success. Personnel counselors 
are interested not only in the health, skills, and intelligence possessed by 
those seeking guidance in educational or vocational selection, but also in 
the personality traits that give promise of success in a particular field of 
activity. 

Various personality inventories have been developed and research 
conducted in an attempt to measure the various aspects of personality and 
to determine the relationship between these traits and vocational selection 
and success. The different inventories have been shown to have varying 
degrees of accuracy and prognostic or diagnostic value. At the present 
time the Minnesota Multiphasic Personality Inventory (MMPI) seems to 
be one of the best. 

During the summer session of 1945, the MMPI, individual form, was 
administered to 115 unmarried, women students in a liberal arts college 
within commuting distance of New York City. In addition to the usual 
liberal arts curricula, this college offers a curriculum for training nurses. 
Sixty-one of the subjects were cadet nurses and 54 were liberal arts stu- 
dents. The results of a study in which this Inventory was given in a New 
York state teachers college to 185 women students, 111 of whom were en- 
rolled in the Music Curriculum and 74 of whom were enrolled in the Gen- 
eral Curriculum for elementary school teachers, have been previously re- 
ported (19). The purposes of this investigation are: 


(1) to compare the women students enrolled in these four curricula on the 
basis of the MMPI; 
437 














438 Orpha Maust Lough 


(2) to attempt to determine whether there are any significant differences 
on any of the scales of the Inventory between the students enrolled | 
in these four curricula; | 

(3) to evaluate the usefulness of this Inventory for counseling college 
women with respect to their vocational or educational selections. 

Findings 

The ages of the liberal arts students ranged from 16 years, 5 months 
to 22 years, 8 months with a mean age of 19.4 years. The ages of the 
nurse cadets ranged from 17 years, 5 months to 22 years, 10 months with 
a mean age of 19.0 years. The ages of the teachers college students en- 
enrolled in the Music Curriculum ranged from 16 years, 9 months to 23 
years, 1 month with a mean age of 18.6 years while those in the General 
Curriculum ranged from 16 years, 8 months to 22 years, 2 months with 
a mean age of 19.0 years. There are no statistically significant differences 
between the mean ages of the four groups. 

In Table 1 are presented the ranges, means, and standard deviations for 
each group of students on the separate scales of the Inventory. All scores 
are presented as T-scores. 

In Table 2 is given a comparison of the women students enrolled in 
four different college curricula on the separate scales of the Inventory 
showing the difference, standard errors of the difference, and the critical 
ratios of T-scores. 

From the data given in Table 2, it is apparent that there are no statis- 
tically significant differences on any of the scales of the Inventory between 
women students enrolled in the four different curricula. 

From the data given in Table 1, it is apparent that the means on each 
of the separate scales of the MMPI for the students in all four curricula 
approach a fairly straight line at the T-score mean level of 50. Although 
the critical ratios as reported in Table 2 show no statistically significant 
differences on the various scales, it would seem that some differences in 
trends may be noted in the different groups. 

On the Hypochondriasis (H,), Depression (D), Interest (M;), and 
Schizophrenia (8,) scales, the nurse cadets have the lowest mean score and 
the Music students the highest. The mean score of all four groups is ap- 
proximately 52 on the Hysteria (H,) scale. On the Psychopathic Deviate 
(Pa) and the Paranoia (P,) scales, the nurse cadets show the lowest and 
the other three groups approximately the same mean T-score. On the 
Psychasthenia (P,) scale, the nurse cadets rank lowest and the students 
in the Generai Curriculum highest. On the Hypomania (M,) scale all 
groups show a mean T-score approximately one-half a standard devia- 
tion above the T-score level of 50. On the Interest (M;) scale, the mean 





26'8 OPS oss ss ots 9b 
£z'8 68h 08-88 GLb ZL-8E ; g'1¢ ; L0g 
eh's8 89 Zl . cor 6-49-68 : 6 Lh ; 6'6F 
60°01 ZIS 8-88 O6r 29-88 ; Zo ; Lig 
6601 ze Teg : 66% 60S 

gos 6 EB ; 88h : £'6P r'09 

CZs ; ZzS Ss L-BE : 62S An 
*8'8 Z8r 40 s«O8-98 ; Scr 2-08 ; £6 lt 
£18 lh ’ CM  e9-L8 f l'8t Lot 
£6'F sz¢ 82-098 6zo 02-09 ; 92S ; LU 
99° S1¢ 2-09 91° 9-09 : Lis ; Lig 
z0'e O1g 9$9-0¢9 sos 6 #-08 ole g1¢ g1¢ 

"W "Ww asuey A asuey *W Al Ww 

‘ag us, . uve ‘as uve ‘a uve 


w=N 19 = N TI" N tL= N 
“LMNO 8wy qr] “LINZ 8SIN N “LIND oIsnyy “LING “Ue 























p 
a 
kK 
> 
; 
© 
3 
= 
; 
= 


Id WW J° sepeog ay81¥deg 04) UO BNIILIND BBe]]0H JusIeyIC] INO WI pejorug syuepNyg oBo1]0 UaUIO A Jo UOsIIvdUIO; 
T 9981. 





Orpha Maust Lough 


$ 








‘mooo ABUI sus snus ‘souey !puoocss wo] Jred B Jo 4sI MT BOUBIORIG , 











£0 §«6l46cI 0 6880 to cert 20°0- £0" «6POOT SO oo” 618I 10°0- £0" «OSst = =66e"0— oo” seer 900 *A 
6 «= PSIT) «=—(OOI— ee 8 6eFsl «16S se 0 ISSTCCOE st OfZt ost S 4 6FSI 698s so 86st 8 sOT— °s 
£0 «63st ) =6(OF 0 60 «6886S CTT Ir 48st IPT Sf ZI Oe 8f Stet = OFEe st 68et 00% a 
we 6st tLs- 90° «set FLO 8 Sest sre jt ztet 800- 2 8686) «68S Lo 8st = 88'0— “d 
Lt «soet = =662% oe sehr = 16s— or ses 0ss— et esl 8 w6I— ce = 86ST IS so Loe 660 i 
cv west FLI— 60 «86SSht «(OS I— tw” 8=6aSTCCU'O To sot 8 FI'O- rr Ost = O9't 40 61SOt:COCOI9'T Pd 
co 220i L220 £0 «60ST LEO 9° «66OTT: OO 00 L4ct 1'0- co 0S It =6(8a'0 eo 86Ssost) 6 68e'0— “H 
we wet Let 60 «=8eZt (FIT ce 0 LOITCséTS'DE Lo =6S6 FT | =6880'I— 60 «66ST BST st LLUt 8 s— a 
iv e890 = 698s 90 «gst 890 se SOOT: = #8 vw OLLTT OI — sr 6966 LTT ew sett lee—- “H 
wo IL £10 ww 96°F st'o— w = 60L 1e°0- 1 02 20°0— to 8 9t'l st‘o— i Ue ero a 
wo 8 6=9Ft's 6T'0— co” eee or'o—- 2c” 80 8a SE 60°0 ao 8 88'y oro— “a” OP 60°0 oo we 00°0 I 
so =F ¢$2'0— “a 9th Teo s.r 9s°0 Lo =6SeF 62°0 Sst 68°F P10 jo It 20'0— é 
wowMPas ‘sda woVrPas ‘Bd wo wPaqs ‘ma wo MPqs ‘Hd wo MPqas ‘Bd wo Mas ‘Bd bead baie = 
IdWW 
(CIIND HIV “GFT 'W (2IND SIV “GIT YIN C1IND O8INN) "1 (C1IND SUV “GYT) "IN C4IND O8INN) "FA (LIND oyeN) 4 
Cid “WeH) "IW (IND “weH) "WW 


(4aND e8NN) *W 


(4IND o1snj) *W 


C4IND FEN) WW 


(IND "UOH) TW 








+ 801008-J, JO ONBY [VONLUD pus ‘eousieyIC] JO JOLY 


paspusyg ‘souslegiq] Jurmoyg setwog A10;UGAUT ay8IBdeg 94} UO BNIIIND oBajOD sno07 ul syuepNyg WeWIO MA jo UOstIBdUIOD 


& MeL 





Minnesota Multiphasic Personality Inventory 441 


T-score of the nurse cadets is one-half a standard deviation above the T- 
score level of 50 and is much higher than in the other groups while those 
in the Music Curriculum rank lowest on this scale. 

Benton (3) in a comparison of psychiatric ratings with the MMPI 
scores states that there was a significant degree of agreement between the 
psychiatric ratings and the test scores with respect to Psychopathic De- 
viate, Paranoia, and Schizophrenia trends but not a significant degree of 
agreement with respect to the Hypochondriasis, Depression, Hysteria, 
Interest, and Psychasthenia trends. If this conclusion drawn from a 
study of navy patients can be further substantiated and can be vali- 
dated with respect to young women, it would follow that there appears 
to be little tendency in the groups studied in this investigation toward the 
“significant” diagnostic trends. 

It would appear from these data that the nurse cadets are more mas- 
culine in their interests than those students selecting other curricula. 
This trend toward masculine interest would, seemingly, predispose to 
success in a profession which entails ‘“tough-mindedness” in many cir- 
cumstances. They are also apparently less inclined to depression, have 
higher morale and a more optimistic outlook on life, are prone to accept 
unemotionally and as an inevitable part of their profession, dirty, heavy, 
or routine work, do not resolve their problems by developing undue anxi- 
ety over their health as a means of gaining desired sympathy, are not 
disposed to excessive or needless worry, tend to be extrovertive, and may 
be more psychologically mature, with fewer fears or compulsions, than 
the women students selecting the other three curricula. 

Elwood (7), in a study of college girls and nurses, tested both groups 
on Laird’s Introvert-Extrovert scale and on the Woodworth Neurotic 
Inventory. He concluded that both tests indicated that nurses showed 
far fewer unhealthy emotional outlets and were particularly outstanding 
in extroversion. This study would tend to verify his conclusions. 

In the Liberal Arts group, the mean T-scores on all except two of the 
separate scales closely approach a straight line at the T-score mean level 
of 50. The lowest mean T-score for this group is on the Psychasthenia 
scale and the highest on the Hypomania scale. It would appear that this 
group, by and large, has fewer serious fears or compulsions and is rela- 
tively more self-confident with less inclination to indulge in excessive 
worry than those in the teaching curricula. 

The students in the General Curriculum preparing to be elementary 
school teachers appear in general to be low on the Hypochondriasis and 
Depression scales which seems to imply good morale, optimistic outlook 
on life, less than average depression, little tendency to become unduly 
worried about matters of health or to use general physical complaints to 
gain sympathy. 











442 Orpha Maust Lough 


The students in the Music Curriculum show high points on the Hys- 
teria, Paranoia, and Hypomania scales which, according to the MMPI 
manual, may indicate greater psychological immaturity than in the other 
groups with a tendency toward ideas of persecution, oversensitivity, sus- 
piciousness, and to meet emergencies by developing physical symptoms. 

All of these student groups show some slight disposition toward Hypo- 
mania which is characterized by over-productivity in thought and action; 
ambition, vigor; activity and enthusiasm, although somewhat depressed 
at times; inclination to undertake too many things at a time; to stir up 
projects and then lose interest in them; a disposition to disregard social 
conventions. These data would tend to substantiate the hypothesis set 
forth in the study of the teacher-in-training group (19) that a slight ten- 
dency to Hypomania may be characteristic of the late adolescent or young 
adult college woman. 

According to the manual of instructions accompanying the MMPI, 
normal persons do not often score above 70; but if the environmental pres- 
sure is small, or if other personality factors are favorable, a person may 
score over 70 and yet escape need for special attention. In Table 3 is 
shown the percentage of the students in these four curricula with T-scores 
over 70 on the separate scales of the Inventory. 


Table 3 


Percentage of Women College Students in Four Curricula with T-scores 
above 70 on the Separate Scales of MMPI 


‘ 











Per Cent of 74 Per Cent of 111 Per Cent of 61 Per Cent of 54 
in Gen. Curr. in Music Curr. in Nurse Curr. _ in Lib. Arts Curr. 





MMPI with T-scores with T-scores with T-scores with T-scores 
Scales above 70 above 70 above 70 above 70 
? 0.0 0.0 0.0 0.0 
L 0.0 0.0 0.0 1.8 
F 1.3 0.0 3.3 18 
H, 0.0 1.8 0.0 3.7 
D 2.6 0.9 1.6 3.7 
Hy, 4.0 5.4 3.3 5.5 
Pa 6.5 2.7 48 74 
M;: 4.0 1.8 3.3 74 
P, 5.4 4.5 0.0 74 
P, 1.3 2.7 0.0 1.8 
8. 5.4 3.6 1.6 3.7 
M, 4.0 10.0 9.8 5.5 





On the basis of the percentage scoring above 70 on the various scales 
of the Inventory as given in Table 3, it would appear that there may be 
further personality differences among the students enrolled in these four 





Minnesota Multiphasic Personality Inventory 443 


curricula. The extremely high scores among the General students are 
on the Psychopathic Deviate-Paranoia-Schizophrenia scales, while those 
among the Music students are on the Hypomania-Hysteria scales. The 
largest percentage of T-scores over 70 among the Liberal Arts students 
are on the Psychopathic Deviate-Interest-Paranoia scales. The ex- 
tremely high scores among the nurse cadets are on the Hypomania scale 
but on three out of the nine “diagnostic’’ scales, none of the nurses scored 
over 70 and only a very small percentage on the remaining scales. 

From these data it would appear that these 300 unmarried, college 
women from a state teachers college and a liberal arts college are, on the 
whole, normal and stable. Seemingly the most significant differences as 
measured by the scales of the MMPI distinguish the nurse cadets by 
their apparent tendency to be more stable, more unemotional, and less 
easily disturbed than college women students selecting any of the other 
three curricula. 

Although further investigation is needed, it would appear that the 
MMPI may be of some value in counseling young women college students. 
Those who rank below or approximately at T-score 50 on the various 
scales apparently may be safely directed into professions or vocations 
where emotional stability is an important factor and which entail working 
with other people, adjusting and adapting to changing conditions. On 
the other hand, those who score high, especially on the Paranoia, Schizo- 
phrenia, or Psychasthenia scales, probably should be guided into life 
work which is largely individualistic in nature and does not involve work- 
ing with others. Those who rank high on the Interest scale would seem- 
ingly succeed in work involving routine or some undersirable physical 
aspects while those who rank low on this scale should apparently be direc- 
ted into aesthetic types of work. Those who rank high on the Hypochon- 
driasis or Hysteria scales are also probably more adapted to aesthetic 
rather than routine types of work. On the whole, however, the MMPI 
would seem to have little or no significance for educational or vocational 
guidance except when a student exhibits a marked deviation in the direc- 
tion of abnormality. 


Summary 


From this investigation of 300 unmarried college women, 185 from 
a state teachers college and 115 from a liberal arts college, it would appear 
that, on the basis of the individual form of the MMPI: 


1. They are a relatively stable, normal group with a very slight ten- 
dency toward Hypomania. 

2. There are no significant differences on the separate scales between 
those preparing to teach in the elementary grades or to teach public 





444 Orpha Maust Lough 


























school music, those preparing to be nurses, and those enrolled in the tra- 
ditional curricula of a liberal arts college. , 

3. On the basis of the mean T-scores, the nurse cadets appear to have 
somewhat more masculine interests and to be more stable and unemo- 
tional than those in the other groups. 

4. On the basis of this study and another previous report by the 
writer (19), the MMPI has little or no value in educational selection; it 
is not a useful instrument for differentiating between those who are more 
suited for one occupation than another. The primary value of the MMPI 
seems to be to give some insight into the emotional life of the individual 
and to detect those who may be in need of psychological or psychiatric 
counseling. Hunt and Stevenson (17) write, ‘““Thus it is aimed not only at 
detecting the deviate but at establishing the clinical direction or direc- 
tions in which the deviation will be found.” ° 


Received August 23, 1946. 


References 


1. Altus, W. D., and Bell, H. M. The validity measures of certain maladjustment in 
an army special training center. Psychol. Bull., 1945, 42, 98-103. 

2. Benton, A. L. The Minnesota multiphasic personality inventory in clinical prac- 
tice. J. nerv. Ment. Dis., 1945, 102, 416-420. 

3. Benton, A. L., and Probst, K. A. A comparison of psychiatric ratings with Minne, 
sota multiphasic personality inventory scores. J. Ab. & Soc. Psychol., 1946, 41- 
75-78. 

4. Capwell, D. F. Personality patterns of adolescent girls: I. Girls who show im- 
provement in I.Q. J. appl. Psychol., 1945, 29, 212-228. 

5. Capwell, D. F. Personality patterns of adolescent girls: 11. Delinquents and non- 
delinquents. J. appl. Psychol., 1945, 29, 289-297. 

6. Drake, L. E. A social I. E. scale for the Minnesota multiphasic personality inven- 
tory. J. appl. Psychol., 1946, 30, 51-54. 

7. Elwood, R. H. The role of personality traits in selecting a career; the nurse and 
the college girl. J. appl. Psychol., 1927, 11, 199-201. 

8. Harmon, L. R., and Wiener, D. N. Use of the Minnesota multiphasic personality 
inventory in vocational advisement. J. appl. Psychol., 1945, 29, 132-141. 

9. Hathaway, 8. R., and McKinley, J. C. A multiphasic personality schedule: I. 
Construction of the schedule. J. Psychol., 1940, 10, 249-254. 

10. McKinley, J. C., and Hathaway, 8S. R. A multiphasic personality schedule: II. 
A differential study of hypochondriasis. J. Psychol., 1940, 10, 255-268. 

11. Hathaway, S. R., and McKinley, J. C. A multiphasic personality schedule: III. 
The measurement of symptomatic depression. J. Psychol., 1942, 14, 73-84. 

12. McKinley, J. C., and Hathaway, 8. R. A multiphasic personality schedule: IV. 
Psychasthenia. J. appl. Psychol., 1942, 26, 614-624. 

13. McKinley, J. C., and Hathaway, 8S. R. The Minnesota multiphasic personality 
inventory: V. Hysteria, hypomania and psychopathic deviate. J. appl. Psychol., 

1944, 28, 153-174. 

14. Hathaway, S. R., and McKinley, J. C. Manual for the Minnesota multiphasic 
personality inventory. New York: The Psychological Corporation, 1944. 


Minnesota Multiphasic Personality Inventory 445 


. Hathaway, S. R. The personality inventory as an aid in the diagnosis of psycho- 

pathic inferiors. J. consult. Psychol., 1939, 3, 112-117. 

. McKinley, J. C., and Hathaway, 8. R. The identification and measurement of 

the psychoneurosis in medical practice: The Minnesota multiphasic personality 

inventory. J. Am. med. Ass., 1943, 122, 161-167. 

- Hunt, W. A., and Stevenson, I. Psychological testing in military clinical psy- 

chology: II. Personality testing. Psychol. Review, 1946, 53, 107-115. 

. Leverenz, G. 8. Minnesota multiphasic personality inventory: An evaluation of 

its usefulness in the psychiatric service of a station hospital. War Med., 1943, 

4, 618-629. 

. Lough, O. M. Teachers college students and the Minnesota multiphasic person- 

ality inventory. J. appl. Psychol., 1946, 30, 241-247. 

. Meehl, P. E. An investigation of a general normality or control factor in per- 

sonality testing. Psychol. Monogr., 1945, 59, 1-62. 

. Schiele, B. C., Baker, A. B., and Hathaway, 8. R. The Minnesota multiphasic 

personality inventory. Journal-Lancet, 1943, 63, 292-297. 

. Schmidt, H. O. Test profiles as a diagnostic aid: The Minnesota multiphasic 
inventory. J. appl. Psychol., 1945, 29, 115-131. 











A Reply to Winfield’s Study of the Multiple Choice Rorschach 


Henry L. Sisk 
Stevenson, Jordan and Harrison, Inc., Chicago 


In a recent paper Winfield ' describes the data she obtained in admin- 
istering the Multiple Choice Rorschach (MCR) and the Minnesota Mul- 
tiphasic Personality Inventory (MMPI) to 181 enlisted members of the 
Marine Corps Women’s Reserve. The MMPI was scored for all meas- 
ures except masculinity-femininity. _Two methods were followed in scor- 
ing the MCR. The first of these considered each “failure” (i.e., ‘nothing 
at all’’ checked or no mark made within an A, B, or C section of the bank) 
as one poor answer. In both instances the total number of responses was 
computed and the percentage of “poor” answers was considered the final 
score. The cutting point of poor answers above which the records were 
considered abnormal was 40%. As might be expected a greater number 
of records with poor answers of 40% or more were obtained when the 
second method of scoring each failure as two poor answers was used. The 
first method yielded 26 cases with more than 40% poor answers while the 
second method included the initial 26 and an additional 5 cases, making 
a total of 31. 

Raw data for all of the cases with abnormal MCRs are included in two 
of the tables presented. Table 5 gives the data for those cases whose crit- 
ical MCR scores were raised by the second method of scoring. Table 6 
is made up of the 26 cases who obtained critical scores of 40% or greater 
when the first method of scoring the MCR was used. In her conclusion 
she states that “Since there is no correspondence between the scores made 
on the Multiple Choice Rorschach and the Minnesota Multiphasic Per- 
sonality Inventory, nor any observed behavior which warranted a diag- 
nosis of maladjustment such as the extreme scores made on the test would 
indicate, it must be concluded that the MCR differentiates something 
other than it purports to do and that further research and standardization 
are necessary before the test can be used on a similarly selected sample for 
the screening of maladjusted individuals.” 

This paper will limit itself to a criticism of the first clause of the con- 


' Winfield, M. C. The use of the Harrower-Erickson Multiple Choice Rorschach 
Test with a selected group of women in military service. J. appl. Psychol., 1946, 30, 
481-487. 


446 








Reply to Winfield’s Multiple Choice Rorschach 447 


clusion ; namely, the unqualified lack of correspondence between the scores 
made on the MCR and the MMPI.’ 

In interpreting her data, Winfield considered only scores obtained on 
the hypochondriasis, depression, hysteria, psychopathic deviate, para- 
noia, psychasthenia, schizophrenia, and hypomania scales as possible 
sources of deviations. The Lie score was overlooked entirely. Accord- 
ing to the manual * “‘A high L score does not entirely invalidate the other 
scores but indicates that the true values are probably higher than those 
obtained.’”* 

While it is entirely true that none of the 26 cases with significant MCR 
scores when scored by method one had any significant scores on the 8 
major scales of the MMPI, 21 of the 26 of cases, or 81%, had Lie scores 
above a T-score of 50. When MCR scores are computed by the second 
method an additional five cases have poor answers of 40%. Here again 
4 out of 5, or 80%, have an elevated L score. Clearly then there is a 
positive relationship between the MCR scores greater than 40%, regard- 
less of the method used in scoring, and the L score. 

Certain characteristics may be noted concerning the five cases scored 
by method one on the MCR with normal L scores and more than 40% 
poor answers on the Rorschach: all have hypochondriasis scores of 41 or 
less, all depression scores are 42 or less, hysteria scores are 50 or less, psy- 
chopathic deviate measures 47 or lower, paranoia and psychasthenia 50 
or less, and schizophrenia less than 52; and in all cases the manic scale is 
elevated in relation to the other measures. The profiles formed by these 
individuals are relatively flat with very few measures exceeding a T-score 
of 50 and the hypomania score is elevated as compared to the other meas- 
ures. Thus the relationship between the MCR and the MMPI to be 
found in these cases with normal L scores exists in the patterning of the 
profile of the MMPI rather than the absolute score obtained. 

There remains one case that had a normal MCR when scored by 
method one and 44% poor answers when scored by method two. That 
particular individual had two measures greater than 70, hypomania and 
schizophrenia; three measures greater than 67, psychopathic deviate, 
paranoia, and psychasthenia; depression, 56, hysteria, 52; and an F score 
of 80. This type of case is discussed in the manual as “‘a number of rather 


* See Harrower-Erickson and Steiner. Large scale Rorschach technique. Springfield, 
Ill.: Charles C. Thomas, 1944, for a complete discussion of the applications of MCR 
to military situations. 

* Hathaway, S. R., and McKinley, J. C. Booklet for the Minnesota Multiphasic 
Personality Inventory. University of Minnesota Press, 1943. 

‘In this connection the reader is referred to a complete discussion of the K-scale 
Meehl, P. E., and Hathaway, 8. R. The K factor as a suppressor variable in the 
MMPI. J. appl. Psychol., 1946, 30, 525-564. 











onaaie Retest eee ane 


448 és Henry L. Sisk 


badly neurotic or psychotic subjects obtain high F scores validly. These 
persons are betrayed by their very high scores on other scales. . . .” 
However, one need not refer to the manual to recognize the abnormality 
of this subject for in order to have the MCR score changed from 28% to 
44% indicates enough “failures” or ‘Nothing at Alls” to question her 
cooperativeness. 


Conclusions 


A re-evaluation of the data presented by Winfield shows that there 
is a positive relationship between poor scores greater than 40% obtained 
on the MCR and MMPI in that 81% of the 26 cases with more than 40% 
poor answers on the MCR as scored by method one had elevated L scores 
on the MMPI. Eighty per cent or four out of the five cases that had 
greater than 40% poor answers on the MCR when scored by the second 
method have elevated L scores on the MMPI. 

The five cases with normal L scores and greater than 40% poor an- 
swers on the MCR when scored by method one formed profiles that can 
be characterized as being rather flat with the hypomania score elevated 
in relation to the rest of the profile. This illustrates the importance of 
the patterning of the MMPI as a factor in its interpretation as well as 
noting the scores of isolated traits. 

The one remaining case which was classified as having greater than 
40% poor answers on the MCR when scored by method two had a defi- 
nitely abnormal record as measured by the MMPI. 


Received April 7, 1947. 
Early publication. 








A Method and Tables for Obtaining Standard Errors of 
Differences Between Proportions When 
N, is Equal to N, 


William H. Lichte 
University of Missouri 


In statistical work there are often instances in which a large number 
of standard errors of differences between proportions must be computed 
when the two groups from which the proportions are obtained are of 
equal size. The most frequent situation of this sort is probably the 
analysis of test items by the critical ratio method. Upper and lower 
criterion groups are selected, such as the highest and lowest thirds in 
terms of scores on the total test, and then, for each item, the critical 
ratio for the difference between the proportion of the upper group passing 
and the proportion of the lower group passing is computed. The method 
to be described and the tables to be presented will greatly simplify com- 
putational procedures for this and similar tasks. This paper will present 
the mathematical basis of the method, the construction of the tables, 
their use to obtain ¢,,-», (when the obtained proportions and N are 
given), and an estimate of the amount of error involved in the use of 
the tables. 

The method and the tables are made possible by a formula for o5,-», 
which can be derived when N; = Nz. The usual formula is 


Tp:—p. = VO", + 0", (1) 


Since ¢, = Vpq/N, we may substitute for o?, and obtain 


1 
Cn-p: = 5 + A 


Because NV; = N¢ this may be re-written as 


opi—p2 


a 2" + peg: 
N 
and putting N under a separate radical gives 


Cn—m = Vp + pan fd 


449 











450 William H. Lichte 


The last formula (4) makes it possible to obtain standard errors of dif- 
ferences in proportions by (1) constructing a table showing the values of 
Vpig + p2g2 for representative combinations of proportions and (2) con- 
structing a second table in which the values from the first table are 
multiplied by ¥1/N for representative Ns. The way in which this was 
done is explained in the following paragraphs. 


Construction of the Tables 


If proportions (p-g combinations) are taken to only two decimal places 
the possible ones are .01-—.99, .02—.98, .03-.97 . . . .99-.01, 1.00-.00. In 
the formulas, however, p is used only in obtaining the product of p and q, 
therefore the number of such products in this series is not 100 but 51, 
because the last half of the series merely duplicates the first, with the 
order of numbers reversed. If all 51 p-g combinations were used the 
resulting table would be difficult to print, so 25 were selected.. The ones 
selected were those which gave an equally spaced coverage of the range of 
pq products from .00(.00 < 1.00) to .25(.50 x .50). 

The p-values selected were entered in the left-nand column and 
bottom row of Table 1, those yielding the same pq product, such as .60 
and .40, or .24 and .76, being put in the same row or column. The body 
of the table was then filled in with the values of Vpiqi + p24: for all the 
combinations of p; and p2. These values range from .10 to .71, inclusive, 
with .11, .12, and .13 omitted. This series was then placed in the first 
column of Table 2 and the succeeding columns were filled in by multi- 
plying the numbers in the first column by ¥1/N for the N given at the 
top of the column. 


Use of the Tables 


The way to use the tables can now readily be seen. When the two 
proportions have been obtained from data, find one proportion (or the 
nearest one given in the table) in the first column of Table 1 and the 
other in the bottom row of the table. (Either of the two ps in the row 
or column may be used.) Read the number given where the row and 
column intersect, then go to Table 2 and find this number in the left- 
hand column. In the top row of Table 2 find the N (or nearest N) for 
the group used. The intersection of row and co!l:mn will then give the 
standard error of the difference between proportions. (It should be 
noted, in connection with the use of Table 1, that the proportions .00 
and 1.00 appear only in the bottom row.) 

A certain amount of error will result from the use of the tables be- 
cause of the limited number of proportions found in the left-hand column 
and bottom row of Table 1. Many obtained proportions will not be 





: 
5 
S 
8 
> 
: 
cs 
z 
5 





o wo ww 19° 
os” ‘or ‘or ‘ee’ 


2 
ag 


46 «686 


‘$0" ZO" 





120 010 69°0 69°0 
69°0 69 89 


SISSISsss | de 


So 


i) 
i) 


SS2eBebeeee 
SSIsrBReeReee 


ot 
++ 
S 


RRERBRRAS 


eso 2g°0 
eg 


SORARARRRRSSRRS 


RRARRRSESSERS 


cotn~ 
a 
So 


SSSSSSSSSRRRIRREE 


PRAUAKARAESS 


Seeseeegecsiss 


=) 
= 














td pus 'd jo sonteA UeAtH 105 ed + ‘DidA jo sante, 
T 9981 








150 


120 


Table 2 
Standard Errors for Given Values of Vpigi + p2g2 and N 





— Bok 8 eh RL) 


et et et et 


ieee? 


Pr 


mn 











William H. Lichte 


weet et et et = 


SS eS eS ee | nl oul 


8 SssSsssese 
8 SS88S8S288 


SSSSS888Ez 
SssssseS38 
SE8888882 


039 


S8SSSSE2z2 
SE82FF27 33 


SZFZ722222 


041 


052 
054 
055 


049 


© 
3 


042 
044 
045 
047 


8 


051 047 045 
052 049 046 
054 051 048 
056 052 049 
057 054 051 
059 055 052 
061 057 054 
063 058 055 
064 060 057 
066 062 058 


055 
057 


BE 


062 


SS25 


071 


SSBESE ZZTZIZSESES SSESESESEE 


RANRASRARR SERBRSRSSRRS 














s 
: 
s 
Q 
> 
S 
E 
R 
: 
S 
NM 





tid + tid p 











panuruog—z e481, 











454 William H. Lichte 


found in the table and slightly different proportions will have to be used. 
Before discussing the importance of this error, methods for reducing it 
will be given. 

Linear interpolation is a simple method of keeping this error small 
and can be used when one or both obtained ps lie between those found in 
Table 1. As an example of the first case the obtained proportions might 
be .04 and .11. The proportion .04 appears in Table 1, but .11 does not 
and either .10 or .12 must be substituted. However, it is easy to read 
.36 for ps of .04 and .10, to read .38 for ps of .04 and .12, and by inter- 
polation to obtain the value of .37, which is correct to two digits for the 
obtained proportions. As an example of the second case the obtained ps 
might be .13 and .29, neither of which appears in the table. One can 
read the value of .55 for ps of .12 and .28, the value .57 for ps of .14 and 
.30, and by interpolation obtain .56, which is correct to two digits. It 
should be noted that interpolation can be used only to select the best 
two-digit value, because only two digits can be carried to Table 2. 

When both of the obtained ps lie approximately halfway between those 
found in the table a method simpler than interpolation but equally ade- 
quate can be used to reduce error. It is based on the fact that each ob- 
tained proportion can be either increased or decreased when one in the 
table is selected as a substitute. . The following two rules indicate which 
way to change each obtained p so as to avoid the larger of the possible 
errors. The rules are: (1) When both obtained ps are greater or less than 
.50, increase one and decrease the other in changing to ps found in the 
table. (2) When one obtained p is greater than .50 and the other is less 
than .50, increase or decrease both obtained ps when changing to those in 
the table. (These rules are derived from an error formula the presenta- 
tion of which would be too lengthy for this paper.) These rules can be 
stated as a single one in terms of position of alternative values in Table 1. 
Whenever there are two possible choices for each p there will be four 
possible values in the table, these four forming a square. To take the 
previous example, if the obtained ps were .13 and .29, the pairs in the 
table which might be used would be .12 and .28, .12 and .30, .14 and .28, 
and .14 and .30. The vVpiqi + p2q2 entries for these four pairs are ar- 
ranged in the table as shown below: 


55 .57 
56.57. 


The rule is: always choose either the upper-right or lower-left of the 
four alternatives. Thus one would choose either .56 or .57; the true 
value is .5648. Choosing the upper-right or lower-left alternative will 
always result from the two rules previously given, so the last one is 
simply another way of stating the first two. 





Standard Errors of Differences 455 


If the number of cases (from which proportions are obtained) does 
not appear in Table 2, linear interpolation in that table will eliminate 
most of the error. If interpolation is not used, choose the higher of the 
two possible Ns in the table if the actual N is halfway between. If 
greater accuracy is desired, new columns can be inserted in the table. 


This is done by multiplying the numbers in the first column of the table 
by V1/N for the desired N. 


Amount of Error in Using the Tables 


The amount of error which will result when the tables are used as 
suggested above is hard to estimate, but a rough guess can be made by 
estimating first the error obtained when neither interpolation nor the 
method for choosing among alternative ps is used. 

Whenever both obtained ps lie between proportions given in Table 1 
the true value of Vpiq: + poq2 will be between the two most extreme of 
the four possible Vp:9q: + p2q2 entries, which will be adjacent and form a 
square in the table as previously illustrated. These two extreme values, 
or limits between which the true value falls, are always the upper-left 
and lower-right ones of the four. Thus any p between .12 and .14 com- 
bined with any p between .28 and .30 will give a value of Vpiqi + pegs 
between .55 and .57, the upper-left and lower-right figures of the block 
illustrated above. The differences between diagonally adjacent entries 
in Table 1 are therefore crucial in estimating error in the use of the tables. 
The maximum possible error will occur when each obtained p is halfway 
between the two nearest ones in the table. This maximum possible 
error was computed at 30 representative points in the table. In one case 
it amounted to 75% of the difference between the limiting values, in 6 
cases it amounted to 61-70% of this difference, and in 23 of the 30 cases 
it was 60% or less of this difference. The extreme case was the one which 
has been used so frequently as an illustration. With obtained ps of .13 
and .29 the true value of Vpigi + p2q2 is .5648, which is approximately 
75% of the distance from the limiting value of .55 to that of .57. 

The distribution of differences between the diagonally adjacent 
(upper-left to lower-right) entries in Table 1 is given in Table 3. A 
majority of these differences, 67%, are .02 or less and only 7% are .04 or 
greater. The information given above suggests that at the great ma- 
jority of points in the table the maximum error will be 60% or less of 
these differences. If this is true, at nearly 67% of the points in the table 
the maximum error will be .012 and at 7% will be greater than .024. 
The average error can only be guessed. If the errors are evenly distributed 
it would be half the maximum error. In this case it would be .006 or 
less at 67% of the points in the table, .009 or less at 93% of the points, 











456 William H. Lichte 


Table 3 


The Distribution of Differences Between Diagonally 
Adjacent Entries in Table 1 











Diff. f % c% 
01 34 11.3 11.3 
02 166 55.3 66.6 
.03 80 26.7 93.3 
04 15 5.0 98.3 
05 2 7 99.0 
.06 2 7 99.7 
07 1 3 100.0 

N = 300 





and .012 or over at 7%. This rough estimate of average error is probably 
high because the errors would not be evenly distributed. Whenever the 
changes in the two ps are in the direction indicated in the rules previously 
given, the error in one p will be in the direction of cancelling the error in 
the other p. This will happen in half the cases, that is, in two out of the 
four possible combinations of plus or minus errors in each p. This lessen- 
ing of error in half the cases will cause the distribution of errors to be 
bell-shaped in some degree rather than flat and the average error will 
therefore be something less than half the maximum error. 

It should be recalled now that the maximum errors will never occur 
if the methods previously suggested are used and also that interpolation 
will reduce all errors. It is believed that the error obtained without the 
use of interpolation and rules for the selection of entries in Table 1 has 
been conservatively (over-) estimated and that in this case the values in 
Table 1 will be correct to within one unit in the second digit. When 
the suggested methods for reducing error are used these values will 
probably be significant to the two digits given. If this is true, the stand- 
ard errors in Table 2 will also be significant to two digits for the Ns given. 


Received September 20, 1946. 





Book Reviews 


McFarland, Ross A. Human factors in air transport design. New York: 
McGraw-Hill, 1946. Pp. xix+670. 


A failure to give adequate forethought in designing aircraft to the re- 
lationship between human and physical variables in flight has given rise 
to major problems involving: (1) the safety and efficiency of operations as 
affected by the performance of the flight crew, (2) the comfort and well- 
being of passengers, and (3) costly modifications of transports after deli- 
very. This is Dr. McFarland’s main thesis. As a contribution to the 
solution of these problems, he systematically describes and analyzes ten 
major considerations of importance to personnel in transport design, 
giving: (1) specific illustrations of faults revealed in scheduled operations, 
(2) a brief description of the physical nature of the relevant variables en- 
countered in flight, (3) an analysis of the relationship between physical 
stimulus and the sensory reactions of crew and passengers, and (4) re- 
commendations concerning the limits within which the physical vari- 
ables should be controlled. 

An introductory chapter on the characteristics of the major air tran- 
sports in current use or under construction and the trends in future design 
is of interest to the air-minded layman. The relative merits of land 
planes and flying boats, gas turbine and jet propulsion motors, and tech- 
nical developments in safety devices are presented in tabular form or 
briefly discussed. There follows a fundamental chapter on high altitude 
operations and pressurized cabins, with a fairly complete discussion of the 
physiological and psychological reactions to anoxia. Frequent reference 
is made to altitudes flown in routine commercial flights, altitudes requiring 
the use of oxygen, and methods of administering it. An analysis of the 
advantages and disadvantages of pressurized cabins favors this feature 
as the best solution for passenger comfort and efficient operation; but it is 
recommended that, until the hazards of sudden decompression are mini- 
mized, flights be kept below 25,000 ft. 

In two chapters dealing with the control of ventilation, temperature, 
humidity and noxious gases detailed, specific, reeommendations are made 
after a thorough analysis of defects and discomforts encountered in rou- 
tine flights. A similar systematic treatment is accorded the problems of 
controlling noise and vibration in the cabins of air transports. The re- 
ported success in the reduction of noise levels in such military planes as 

457 











458 Book Reviews 


the B-29 by appropriate design supports the recommendation for similar 
reductions in commercial transports. It is further recommended that 
acoustical structures be designed so as to provide, wherever possible, ther- 
mal insulation as well. Chapters on acceleration, motion, and flight per- 
formance and on the design of cockpit and control cabins give recom- 
mendations on such matters as the minimizing of air sickness, the use of 
automatic pilots, simplification of instrument panels, cockpit illumination, 
and the removal of ice and glare from windshields. The remaining chap- 
ters deal with passenger accommodations, the control of insects and air- 
borne diseases, and the prevention of accidents. Recommendations are 
made for the installation of public address systems, radios, and motion 
picture projectors. More adequate safety devices are recommended, but 
greater emphasis is put on larger margins of safety in design to cover 
human variability and fallibility. 

For the task of integrating the contributions of the physiologist, the 
psychologist, and the aeronautical engineer, Dr. McFarland is probably 
uniquely qualified. He has done an exhaustive task in a thoroughly 
workmanlike manner. Though the pace is at times pedestrian, his con- 
tribution is solid and important. 


G. Milton Smith 
The City College of New York 


Edgerton, Harold A. (Editor). Ohio State and occupations. Columbus: 

Ohio State University Press, 1945. Pp. viii + 198. $1.50. 

This paper-backed volume contains brief descriptions of occupations 
relevant to curricula or majors offered at Ohio State. 

This collection of job descriptions has the virtue of focusing attention 
on the vocational implications of the curriculum. It specifies for both 
students and faculty the potential job outlets of the training sequences 
at Ohio State. Such an informational technique is useful in building a 
guidance and instructional program which is cognizant of the dominating 
vocational motivation of the majority of college students. Noteworthy 
is the inclusion of descriptions of entry and skilled and semi-professional 
level occupations such as architectural draftsman, shipping clerk, rodman, 
milk bottling operator, and feed salesman thus giving a realistic hint of 
the types of beginning jobs which drop-outs and graduating seniors actu- 
ally may find available to them. 

The professional personnel worker, however, will require more de- 
tailed information than is available here. Information regarding requi- 
site aptitudes and interests is almost entirely lacking and its omission may 
lead the student to minimize the importance of such personal attributes 
in making a vocational choice. The quality of the materials in this pub- 
lication varies from department to department. Those prepared by the 





Book Reviews 459 


College of Agriculture might well serve as models for other colleges and 
universities wishing to make a similar compilation. 

Ohio State and occupations is a product of the joint efforts of the staff 
of the Occupational Opportunities Service and the University faculty. 
The usual apathy of faculty and, frequently, personnel officers themselves, 
toward such enterprises was somehow overcome and worthwhile materials 
produced: The demonstration that such useful materials can be pro- 
duced locally is the major contribution of the collaborators. 


Arthur H. Brayfield 
Long Beach City College, 
Long Beach, California 


Cleeton, Glen U., and Mason, Charles W. Executive ability, Its dis- 
covery and development. Yellow Springs, Ohio. Antioch Press, 1946, 
540 pages. 

The aim of the book is quite comprehensive—to define executive func- 
tions and give procedures for discovery and development of executive 
ability. This is a large order, especially where so much of the available 
material is of an a priori rather than empirical character. The authors 
get about as much out of this material as could be done at this juncture. 
The fact that there are many unsolved problems in this field is not their 
fault. In fact, they include some debatable questions hoping to stimulate 
research. 

About every conceivable problem in industrial psychology is men- 
tioned somewhere in the book, presumably because executives are con- 
cerned with these problems. Naturally most of these must be touched 
upon or hinted at rather superficially. However, there are plenty of re- 
ferences so that the reader can obtain more details in the appropriate 
sources. 

Psychologists, presumably, will be most interested in the portions deal- 
ing with selection and training of executives. These sections describe 
such tests as are available, likewise rating scales. They give a fair 
amount of detail as to the techniques of constructing, administering, and 
validating tests. There is some consideration of selection of traits con- 
stituting executive ability by using the pooled judgment of experts. This 
results in a pretty good list of executive traits. For this and other reasons 
the book would be worth while reading prior to undertaking research in 
this field. The authors stress emotions among executive qualifications 
more than usually is done but extend the term to include many aspects of 
personality. 

There are quite a few self-rating exercises which the reader is encour- 
aged to try such as interest tests and personality tests, with keys in the 
appendix. These would be all right for use in a college class with appro- 











460 Book Reviews 


priate interpretation by the instructor. It is questionable practice to 
put them in the hands of the unguided reader who might attempt to eval- 
uate himself and attach too much significance to them. 

The authors wisely stress the importance of supervision as one phase 
of executive function and give a good outline of supervisors’ problems. 

The consideration of incentives and the training of supervisors is all 
too brief. It is interesting to note that the authors have caught on to the 
new case study method which is proving successful in supervisory training. 
There is a sensible series of suggestions in the general field of industrial re- 
lations. In the spirit of the times the authors conclude their discussion 
with “The Executive in a Democracy.” They stress the importance of 
leadership in this connection and give a list of suggested readings on social 
problems which the executive might well investigate. 

The reviewer is not sure how valuable the work will be to an actual 
executive or a prospective executive—probably not as much as the au- 
thors would hope. To the industrial psychologist or the student in this 
field, however, it is worth while to have a lot of this material brought 
together with adequate references. It would be helpful to anyone who 
is going through the literature with a view to further research in this field. 
One cannot help being impressed by the paucity of empirical material and 
the need for such further research. 


Harold E. Burtt 
Ohio State University 


A. L. Edwards. Statistical analysis for students in psychology and edu- 
cation. New York: Rinehart & Company, Inc., 1946, xviii + 360 pp. 
$3.50. 

The traditional textbook in elementary statistics written for students 
in psychology and education assumes a degree of mastery of elementary 
mathematics which many of these students unfortunately do not possess. 
The introductory course in statistics thus becomes an unpleasant ex- 
perience in which the student not only finds himself in a course for which 
he is not prepared, but also in a course which seems outside of his field of 
major interest and specialization. This text is a departure from tradition 
in that it is written for the person with limited mathematical background 
who does not particularly enjoy involved mathematical operations. By 
careful progression from easy to difficult items, by simplification of math- 
ematical notation, by de-emphasis of algebraic derivation and proof, by 
coding methods which reduce the labor involved in computations, and by 
selection of data from familiar fields in social psychology, the author hopes 
to reduce the emotional response to this beginning course, and to make it 
a professional course in psychology which the student will find a pleasant 
and stimulating experience. 





Book Reviews 461 


The text assumes that a review of basic mathematics is required for 
almost every student. The course therefore is introduced by a fifteen- 
page chapter entitled “Survey of Rules and Principles,’’ which discusses 
the handling of fractions, decimals, proportions, per cents, positive and 
negative numbers, numbers in series, squares and square roots, summa- 
tion, and simple equations. 

Following this short survey of rules and principles, chapters are pre- 
sented entitled: Measures of Central Tendency and Variability; Simpli- 
fying Statistical Computations; the Product-Moment Correlation Coeffi- 
cient; the Correlation Ratio and Other Measures of Association; Prob- 
ability and Frequency Distributions; Sampling Distributions; the ¢ Test 
of Significance; Analysis of Variance: Independent Groups; Analysis of 
Variance: Matched Groups; the Chi-Square Test of Significance; Pre- 
dictions and the Evaluation of Predictions; and Research and Experi- 
mentation. The appendix contains tables for squares and square roots of 
numbers from one to one thousand, for areas and ordinates of the normal 
curve, for values of t, r, F, and eta-square at the one per cent and five per 
cent levels of significance, and for values of chi-square. A table of ran- 
dom numbers is also supplied. 

The advance publicity given the book having emphasized the sum- 
mary of mathematics which it includes, the reviewer expected much more 
of a treatment than he found. The fifteen pages allotted to this topic 
give space for little more than a brief statement about each item, and give 
evidence either of excessive cutting or quick preparation. The lucid writ- 
ing which characterizes the rest of the text gives way to such teasers as 
these: “The summation of a variable divided by a constant is equal to the 
summation of the variable, divided by the constant” (p. 25); ‘To find the 
number that a given proportion of a total equals, multiply the total by 
the proportion” (p. 18), and, ‘““The summation of an algebraic sum of two 
or more terms is the same as the algebraic sums of these terms taken sepa- 
rately” (p. 24). A student who requires very much review of mathe- 
matics will find little help from this chapter. 

Excepting the chapter reviewing basic mathematics, the textbook is 
an exceedingly well written and easily understood text. The organ- 
ization of material is in places unorthodox, but defensible as making easier 
the comprehension of the material presented. The integration of small 
sample statistics as an essential part of the discussion of large sample 
methods gives the book a coherence and logic which should serve to de- 
crease the confusion which usually results when the introduction of small 
sample methods is delayed. Precision of description and definition of 
statistical terms and operations seldom suffer from the style of writing 
which Edwards uses, although there are a few statements where processes 








462 Book Reviews 


are short-circuited as in the following: “We pool the data of the two 
groups to get our hypothesis and then calculate chi-square, assuming the 
hypothesis to be true” (p. 249). 

In writing a text which would be easily understood by the beginning 
student, Edwards has also written one which adapts itself to self study 
more readily than any other book in the field. The industrial research 
worker, for example, who needs some review of statistics, or who needs a 
book to give to an ambitious but undertrained assistant, will find this 
book well suited to his needs. Students of psychology who need review 
of elementary statistics will find the book very usable. Professional psy- 
chologists who wish to become better acquainted with the elementary 
aspects of the newer types of small sample statistics will find this volume 
an excellent reference. 


Kenneth E. Clark 
University of Minnesota 


Industrial psychology and personnel practice, Vol. 2, No. 2, Industrial Wel- 
fare Division, Department of Labour and National Service, Common- 
wealth of Australia, June, 1946 (pp. 48). 


This publication, begun in 1945, represents the Australian counterpart 
and combination of our Journal of Applied Psychology, the Personnel 
Journal, and Psychological Abstracts. It is limited to the fields of in- 
dustrial psychology and personnel practice and contains original articles, 
reviews, and abstracts. As far as the reviewer has been able to determine, 
it is the only Australian publication covering these topics. It was under- 
taken by the government to stimulate the study of these subjects. The 
content of the bulletin indicates a current heavy leaning on English and 
American research. The articles suggest that modern personnel pro- 
grams as well as industrial psychology are beginning to gain status in 
Australia and are having the usual struggle of a new program. Paral- 
leling this infant growth, the new journal is now clothed in only a few 
mimeographed pages but it is predicted that it may become an important 
source of contributions in the future. 


Brent Baxter 
Chesapeake and Ohio Railway Co., 
Cleveland, Ohio 














+ rene remratie’ » 














— ove. > s. - > - 
= ~ ‘ = > ape Se - . Sat ae 2 @) et 
. . oa £ 
. J x a] “ . 
pea i 
~ f . 
- e Fi ‘ 
‘ 3 . 
. ; i 
r @ = 
‘ : 3 
: aig 
a . 
. sie 
sgidirnee . ; : 
é * : - ait . 
- ‘ ¥, - - : a 
. i A iy ‘ x ' . - 
x F) : ~< 
. . 4 i. P 
4 ' | es i ; ‘ ; 
, a : 
. s < . 
. we . . i : 
i . * 7 
: ss 
. ° : 
« : ~ 
. 
4 - . ‘i 
4 2 4 ; : 
® : 
. . . e : 
. es t bio ; one : : | 
ee y Ke i « 
” ' 
“ 0 Dp * - 
. z e 
. P 
" » 
. j g - 
we . 
‘ ‘ Z : 
_ : e ’ 
ee ag tt ones aaa - : 
‘ “ee 
4 . 
: - 
e 
. « Fs wa 
; ~ . i“ 7 
“ ; 
soap ed °°? . 
: y - 
» . ~ , ae , 
. ¢ : 
/ , ie ; 
“ y we 
y ° 
’ : 
SS “ . 
t 7 ; 
. 2 a ’ és ‘a ‘ : 
* , = tom 
. + i 
. 
= —— § : . . 
i! ‘ 
. * 
, ; : 
‘ 7 P 4 
% : 7 wane ad ; 
* 
wer 
2 % > 
“ . ; 
. s - 
( . ey , 
‘ : F 
2 : : : 
. sibhisacSriceatcets ‘ 
° ; 
‘ 








